CVPR 2026Highlight

    MSPT

    Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention

    Pedro M. P. Curvo · Jan-Willem van de Meent · Maksim Zhdanov

    University of Amsterdam

    Abstract

    Scaling neural solvers without losing global structure

    A key scalability challenge in neural solvers for industrial-scale physics simulations is efficiently capturing both fine-grained local interactions and long-range global dependencies across millions of spatial elements. MSPT combines local point attention within patches with global attention to pooled patch-level representations. To partition irregular geometries into spatially coherent patches, MSPT uses ball trees. This dual-scale design enables scaling to millions of points on a single GPU while maintaining high accuracy on PDE and CFD benchmarks.

    MSPT overview figure with memory and latency comparison
    Parallelized Multi-Scale Attention. Local patch attention and pooled global communication target industrial-scale simulations while reducing memory and latency.

    Contributions

    Introduces Parallelized Multi-Scale Attention (PMSA), combining local patch attention with global cross-patch communication in a single attention pattern.

    Builds MSPT as a multi-block transformer for arbitrary geometries and changing resolutions using ball-tree partitioning and supernode pooling.

    Shows strong performance on standard PDE tasks and large-scale aerodynamic datasets while keeping memory and runtime scaling practical.

    Method

    Patch locally, communicate globally

    Given point features, MSPT partitions the domain into K patches of size L. Each patch is pooled into Q supernodes. Attention is then computed on augmented tokens that include both local patch tokens and global pooled tokens, enabling simultaneous local detail modeling and long-range communication.

    • Partition N spatial elements into K coherent patches using ball trees.
    • Pool each patch into Q supernodes to expose long-range structure without paying full global-attention cost.
    • Run attention over local tokens plus pooled global tokens, yielding complexity O(NL + N^2Q/L).
    MSPT block diagram
    Orbit to inspect the mesh, then start the MSPT sequence to watch point sampling, patch partitioning, pooled markers, and attention flow.

    Complexity

    O(NL + N^2Q/L)

    With practical settings where KQ << N, the architecture preserves local fidelity while keeping global context affordable.

    Main results

    Benchmark performance across PDE and aerodynamic tasks

    PDE Standard Bench

    Relative L2 error where lower is better

    ModelElasticityPlasticityAirfoilPipeNavier-StokesDarcy
    FNO////15.561.08
    ... show 14 more baselines
    MWT3.590.760.750.7715.410.82
    U-FNO2.390.392.690.5622.311.83
    geo-FNO2.290.741.380.6715.561.08
    U-NO2.580.340.781.0017.131.13
    F-FNO2.630.470.780.7023.220.77
    LSM2.180.250.590.5015.350.65
    Galerkin2.401.201.180.9814.010.84
    HT-Net/3.330.650.5918.470.79
    OFormer1.830.171.831.6817.051.24
    GNOT0.863.360.760.4713.801.05
    FactFormer/3.120.710.6012.141.09
    ONO1.180.480.610.5211.950.76
    Transolver0.640.120.530.339.000.57
    Erwin0.340.102.570.61N/AN/A
    MSPT (Ours)0.480.100.510.316.320.63
    Relative Promotion-41%17%4%6%30%-10%

    ShapeNet-Car

    Relative L2 error where lower is better

    ModelVolumeSurfCDrhoD
    Simple MLP5.1213.043.0794.96
    ... show 11 more baselines
    GraphSAGE4.6110.502.7096.95
    PointNet4.9411.042.9895.83
    Graph U-Net4.7111.022.2697.25
    MeshGraphNet3.547.811.6898.40
    GNO3.838.151.7298.34
    GALERKIN3.398.781.7997.64
    GEO-FNO16.7023.786.6482.80
    GNOT3.297.981.7898.33
    GINO3.868.101.8498.26
    3D-GEOCA3.197.791.5998.42
    Transolver2.077.451.0399.35
    MSPT (Ours)1.897.410.9899.41
    Relative Promotion+8.7%+0.5%+4.9%+0.06%
    AB-UPT1.164.81N/AN/A
    AB-UPT (repr.)2.517.672.2097.48

    AhmedML

    Relative L2 error where lower is better

    ModelVolumeSurf
    PointNet5.448.02
    ... show 7 more baselines
    Graph U-Net4.156.46
    GINO6.237.90
    LNO7.5912.95
    UPT2.734.25
    OFormer3.634.12
    Transolver2.053.45
    Transformer2.093.41
    MSPT (Ours)2.043.22
    Relative Promotion+0.49%+6.67%
    AB-UPT1.903.01
    AB-UPT (repr.)2.394.33

    Ablation and efficiency

    Where the gains come from

    Pooling and supernode ablation on ShapeNet-Car
    Pooling and number of supernodes (Q) ablation on ShapeNet-Car.
    Memory and runtime scaling by point count
    Memory and runtime scaling as the number of points increases.

    Quick Start

    git clone --recurse-submodules https://github.com/pedrocurvo/mspt.git
    cd mspt/Neural-Solver-Library-MSPT
    pip install -r requirements.txt
    bash ./scripts/StandardBench/plasticity/MSPT.sh

    Citation

    @InProceedings{curvo2026mspt,
      title={MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention},
      author={Pedro M. P. Curvo and Jan-Willem van de Meent and Maksim Zhdanov},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2026},
      url={https://arxiv.org/abs/2512.01738},
    }

    Also available at