MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention

Abstract

A key scalability challenge in neural solvers for industrial-scale physics simulations is efficiently capturing both fine-grained local interactions and long-range global dependencies across millions of spatial elements. MSPT combines local point attention within patches with global attention to pooled patch-level representations. To partition irregular geometries into spatially coherent patches, MSPT uses ball trees. This dual-scale design enables scaling to millions of points on a single GPU while maintaining high accuracy on PDE and CFD benchmarks.

MSPT overview figure with memory and latency comparison

Parallelized Multi-Scale Attention (PMSA). MSPT combines local patch attention with global supernode communication, targeting industrial-scale simulations while reducing memory and latency.

Contributions

Introduces Parallelized Multi-Scale Attention (PMSA) to process local patch interactions and global cross-patch interactions in one unified attention operation.
Proposes MSPT, a multi-block transformer architecture for arbitrary geometries and varying resolutions via ball-tree partitioning and supernode pooling.
Demonstrates strong benchmark performance across standard PDE tasks and large-scale aerodynamic datasets (ShapeNet-Car, AhmedML), with favorable efficiency scaling.

Method

Given point features, MSPT partitions the domain into K patches of size L. Each patch is pooled into Q supernodes. Attention is then computed on augmented tokens that include both local patch tokens and global pooled tokens, enabling simultaneous local detail modeling and long-range communication.

The resulting complexity is O(NL + N^2Q/L). With practical settings where KQ << N, this provides a favorable trade-off between local fidelity and scalable global context.

Main Results

Relative L2 error (×10^-2, lower is better).

PDE Standard Bench ShapeNet-Car AhmedML

PDE Standard Bench

Model	Elasticity	Plasticity	Airfoil	Pipe	Navier-Stokes	Darcy
FNO	/	/	/	/	15.56	1.08
MWT	3.59	0.76	0.75	0.77	15.41	0.82
U-FNO	2.39	0.39	2.69	0.56	22.31	1.83
geo-FNO	2.29	0.74	1.38	0.67	15.56	1.08
U-NO	2.58	0.34	0.78	1.00	17.13	1.13
F-FNO	2.63	0.47	0.78	0.70	23.22	0.77
LSM	2.18	0.25	0.59	0.50	15.35	0.65
Galerkin	2.40	1.20	1.18	0.98	14.01	0.84
HT-Net	/	3.33	0.65	0.59	18.47	0.79
OFormer	1.83	0.17	1.83	1.68	17.05	1.24
GNOT	0.86	3.36	0.76	0.47	13.80	1.05
FactFormer	/	3.12	0.71	0.60	12.14	1.09
ONO	1.18	0.48	0.61	0.52	11.95	0.76
Transolver	0.64	0.12	0.53	0.33	9.00	0.57
Erwin	0.34	0.10	2.57	0.61	N/A	N/A
MSPT (Ours)	0.48	0.10	0.51	0.31	6.32	0.63
Relative Promotion	-41%	17%	4%	6%	30%	-10%

ShapeNet-Car

Model	Volume	Surf	C_D	ρ_D
Simple MLP	5.12	13.04	3.07	94.96
GraphSAGE	4.61	10.50	2.70	96.95
PointNet	4.94	11.04	2.98	95.83
Graph U-Net	4.71	11.02	2.26	97.25
MeshGraphNet	3.54	7.81	1.68	98.40
GNO	3.83	8.15	1.72	98.34
GALERKIN	3.39	8.78	1.79	97.64
GEO-FNO	16.70	23.78	6.64	82.80
GNOT	3.29	7.98	1.78	98.33
GINO	3.86	8.10	1.84	98.26
3D-GEOCA	3.19	7.79	1.59	98.42
Transolver	2.07	7.45	1.03	99.35
MSPT (Ours)	1.89	7.41	0.98	99.41
Relative Promotion	+8.7%	+0.5%	+4.9%	+0.06%
AB-UPT	1.16	4.81	N/A	N/A
AB-UPT (repr.)	2.51	7.67	2.20	97.48

AhmedML

Model	Volume	Surf
PointNet	5.44	8.02
Graph U-Net	4.15	6.46
GINO	6.23	7.90
LNO	7.59	12.95
UPT	2.73	4.25
OFormer	3.63	4.12
Transolver	2.05	3.45
Transformer	2.09	3.41
MSPT (Ours)	2.04	3.22
Relative Promotion	+0.49%	+6.67%
AB-UPT	1.90	3.01
AB-UPT (repr.)	2.39	4.33

Ablation and Efficiency

Pooling and number of supernodes (Q) ablation on ShapeNet-Car.

Memory and speed comparison by point count

Memory and runtime scaling as the number of points increases.

Quick Start

git clone --recurse-submodules https://github.com/pedrocurvo/mspt.git
cd mspt/Neural-Solver-Library-MSPT
pip install -r requirements.txt
bash ./scripts/StandardBench/plasticity/MSPT.sh

Citation

@misc{curvo2025msptefficientlargescalephysical,
  title={MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention},
  author={Pedro M. P. Curvo and Jan-Willem van de Meent and Maksim Zhdanov},
  year={2025},
  eprint={2512.01738},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2512.01738},
}