CVPR 2026Highlight

MSPT

Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention

Pedro M. P. Curvo · Jan-Willem van de Meent · Maksim Zhdanov

University of Amsterdam

Abstract

Scaling neural solvers without losing global structure

A key scalability challenge in neural solvers for industrial-scale physics simulations is efficiently capturing both fine-grained local interactions and long-range global dependencies across millions of spatial elements. MSPT combines local point attention within patches with global attention to pooled patch-level representations. To partition irregular geometries into spatially coherent patches, MSPT uses ball trees. This dual-scale design enables scaling to millions of points on a single GPU while maintaining high accuracy on PDE and CFD benchmarks.

MSPT overview figure with memory and latency comparison — **Parallelized Multi-Scale Attention.** Local patch attention and pooled global communication target industrial-scale simulations while reducing memory and latency.

Contributions

Introduces Parallelized Multi-Scale Attention (PMSA), combining local patch attention with global cross-patch communication in a single attention pattern.

Builds MSPT as a multi-block transformer for arbitrary geometries and changing resolutions using ball-tree partitioning and supernode pooling.

Shows strong performance on standard PDE tasks and large-scale aerodynamic datasets while keeping memory and runtime scaling practical.

Method

Patch locally, communicate globally

Given point features, MSPT partitions the domain into K patches of size L. Each patch is pooled into Q supernodes. Attention is then computed on augmented tokens that include both local patch tokens and global pooled tokens, enabling simultaneous local detail modeling and long-range communication.

Partition N spatial elements into K coherent patches using ball trees.
Pool each patch into Q supernodes to expose long-range structure without paying full global-attention cost.
Run attention over local tokens plus pooled global tokens, yielding complexity O(NL + N^2Q/L).

Orbit to inspect the mesh, then start the MSPT sequence to watch point sampling, patch partitioning, pooled markers, and attention flow.

Complexity

O(NL + N^2Q/L)

With practical settings where KQ << N, the architecture preserves local fidelity while keeping global context affordable.

Main results

Benchmark performance across PDE and aerodynamic tasks

PDE Standard Bench

Relative L2 error where lower is better

Model

Elasticity

Plasticity

Airfoil

Pipe

Navier-Stokes

Darcy

FNO

15.56

1.08

... show 14 more baselines

MWT	3.59	0.76	0.75	0.77	15.41	0.82
U-FNO	2.39	0.39	2.69	0.56	22.31	1.83
geo-FNO	2.29	0.74	1.38	0.67	15.56	1.08
U-NO	2.58	0.34	0.78	1.00	17.13	1.13
F-FNO	2.63	0.47	0.78	0.70	23.22	0.77
LSM	2.18	0.25	0.59	0.50	15.35	0.65
Galerkin	2.40	1.20	1.18	0.98	14.01	0.84
HT-Net	/	3.33	0.65	0.59	18.47	0.79
OFormer	1.83	0.17	1.83	1.68	17.05	1.24
GNOT	0.86	3.36	0.76	0.47	13.80	1.05
FactFormer	/	3.12	0.71	0.60	12.14	1.09
ONO	1.18	0.48	0.61	0.52	11.95	0.76
Transolver	0.64	0.12	0.53	0.33	9.00	0.57
Erwin	0.34	0.10	2.57	0.61	N/A	N/A

MSPT (Ours)

0.48

0.10

0.51

0.31

6.32

0.63

Relative Promotion

-41%

17%

30%

-10%

ShapeNet-Car

Relative L2 error where lower is better

Model

Volume

Surf

rhoD

Simple MLP

5.12

13.04

3.07

94.96

... show 11 more baselines

GraphSAGE	4.61	10.50	2.70	96.95
PointNet	4.94	11.04	2.98	95.83
Graph U-Net	4.71	11.02	2.26	97.25
MeshGraphNet	3.54	7.81	1.68	98.40
GNO	3.83	8.15	1.72	98.34
GALERKIN	3.39	8.78	1.79	97.64
GEO-FNO	16.70	23.78	6.64	82.80
GNOT	3.29	7.98	1.78	98.33
GINO	3.86	8.10	1.84	98.26
3D-GEOCA	3.19	7.79	1.59	98.42
Transolver	2.07	7.45	1.03	99.35

MSPT (Ours)

1.89

7.41

0.98

99.41

Relative Promotion

+8.7%

+0.5%

+4.9%

+0.06%

AB-UPT

1.16

4.81

N/A

AB-UPT (repr.)

2.51

7.67

2.20

97.48

AhmedML

Relative L2 error where lower is better

Model

Volume

Surf

PointNet

5.44

8.02

... show 7 more baselines

Graph U-Net	4.15	6.46
GINO	6.23	7.90
LNO	7.59	12.95
UPT	2.73	4.25
OFormer	3.63	4.12
Transolver	2.05	3.45
Transformer	2.09	3.41

MSPT (Ours)

2.04

3.22

Relative Promotion

+0.49%

+6.67%

AB-UPT

1.90

3.01

AB-UPT (repr.)

2.39

4.33

Ablation and efficiency

Where the gains come from

Pooling and supernode ablation on ShapeNet-Car — Pooling and number of supernodes (`Q`) ablation on ShapeNet-Car.

Memory and runtime scaling by point count — Memory and runtime scaling as the number of points increases.

Quick Start

git clone --recurse-submodules https://github.com/pedrocurvo/mspt.git
cd mspt/Neural-Solver-Library-MSPT
pip install -r requirements.txt
bash ./scripts/StandardBench/plasticity/MSPT.sh

Citation

@InProceedings{curvo2026mspt,
  title={MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention},
  author={Pedro M. P. Curvo and Jan-Willem van de Meent and Maksim Zhdanov},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026},
  url={https://arxiv.org/abs/2512.01738},
}

Also available at

/mspt//projects/mspt