MSPT

Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention

Pedro M. P. Curvo  |  Jan-Willem van de Meent  |  Maksim Zhdanov

University of Amsterdam

Abstract

A key scalability challenge in neural solvers for industrial-scale physics simulations is efficiently capturing both fine-grained local interactions and long-range global dependencies across millions of spatial elements. MSPT combines local point attention within patches with global attention to pooled patch-level representations. To partition irregular geometries into spatially coherent patches, MSPT uses ball trees. This dual-scale design enables scaling to millions of points on a single GPU while maintaining high accuracy on PDE and CFD benchmarks.

MSPT overview figure with memory and latency comparison

Parallelized Multi-Scale Attention (PMSA). MSPT combines local patch attention with global supernode communication, targeting industrial-scale simulations while reducing memory and latency.

Contributions

  • Introduces Parallelized Multi-Scale Attention (PMSA) to process local patch interactions and global cross-patch interactions in one unified attention operation.
  • Proposes MSPT, a multi-block transformer architecture for arbitrary geometries and varying resolutions via ball-tree partitioning and supernode pooling.
  • Demonstrates strong benchmark performance across standard PDE tasks and large-scale aerodynamic datasets (ShapeNet-Car, AhmedML), with favorable efficiency scaling.

Method

Given point features, MSPT partitions the domain into K patches of size L. Each patch is pooled into Q supernodes. Attention is then computed on augmented tokens that include both local patch tokens and global pooled tokens, enabling simultaneous local detail modeling and long-range communication.

The resulting complexity is O(NL + N^2Q/L). With practical settings where KQ << N, this provides a favorable trade-off between local fidelity and scalable global context.

MSPT block diagram
MSPT Demo

Main Results

Relative L2 error (×10^-2, lower is better).

PDE Standard Bench

Model Elasticity Plasticity Airfoil Pipe Navier-Stokes Darcy
FNO / / / / 15.56 1.08
MWT 3.59 0.76 0.75 0.77 15.41 0.82
U-FNO 2.39 0.39 2.69 0.56 22.31 1.83
geo-FNO 2.29 0.74 1.38 0.67 15.56 1.08
U-NO 2.58 0.34 0.78 1.00 17.13 1.13
F-FNO 2.63 0.47 0.78 0.70 23.22 0.77
LSM 2.18 0.25 0.59 0.50 15.35 0.65
Galerkin 2.40 1.20 1.18 0.98 14.01 0.84
HT-Net / 3.33 0.65 0.59 18.47 0.79
OFormer 1.83 0.17 1.83 1.68 17.05 1.24
GNOT 0.86 3.36 0.76 0.47 13.80 1.05
FactFormer / 3.12 0.71 0.60 12.14 1.09
ONO 1.18 0.48 0.61 0.52 11.95 0.76
Transolver 0.64 0.12 0.53 0.33 9.00 0.57
Erwin 0.34 0.10 2.57 0.61 N/A N/A
MSPT (Ours) 0.48 0.10 0.51 0.31 6.32 0.63
Relative Promotion -41% 17% 4% 6% 30% -10%

ShapeNet-Car

Model Volume Surf CD ρD
Simple MLP 5.12 13.04 3.07 94.96
GraphSAGE 4.61 10.50 2.70 96.95
PointNet 4.94 11.04 2.98 95.83
Graph U-Net 4.71 11.02 2.26 97.25
MeshGraphNet 3.54 7.81 1.68 98.40
GNO 3.83 8.15 1.72 98.34
GALERKIN 3.39 8.78 1.79 97.64
GEO-FNO 16.70 23.78 6.64 82.80
GNOT 3.29 7.98 1.78 98.33
GINO 3.86 8.10 1.84 98.26
3D-GEOCA 3.19 7.79 1.59 98.42
Transolver 2.07 7.45 1.03 99.35
MSPT (Ours) 1.89 7.41 0.98 99.41
Relative Promotion +8.7% +0.5% +4.9% +0.06%
AB-UPT 1.16 4.81 N/A N/A
AB-UPT (repr.) 2.51 7.67 2.20 97.48

AhmedML

Model Volume Surf
PointNet 5.44 8.02
Graph U-Net 4.15 6.46
GINO 6.23 7.90
LNO 7.59 12.95
UPT 2.73 4.25
OFormer 3.63 4.12
Transolver 2.05 3.45
Transformer 2.09 3.41
MSPT (Ours) 2.04 3.22
Relative Promotion +0.49% +6.67%
AB-UPT 1.90 3.01
AB-UPT (repr.) 2.39 4.33

Ablation and Efficiency

Pooling and supernode ablation

Pooling and number of supernodes (Q) ablation on ShapeNet-Car.

Memory and speed comparison by point count

Memory and runtime scaling as the number of points increases.

Quick Start

git clone --recurse-submodules https://github.com/pedrocurvo/mspt.git
cd mspt/Neural-Solver-Library-MSPT
pip install -r requirements.txt
bash ./scripts/StandardBench/plasticity/MSPT.sh

Citation

@misc{curvo2025msptefficientlargescalephysical,
  title={MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention},
  author={Pedro M. P. Curvo and Jan-Willem van de Meent and Maksim Zhdanov},
  year={2025},
  eprint={2512.01738},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2512.01738},
}