MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention
Pedro M. P. Curvo, Jan-Willem van de Meent, Maksim Zhdanov
TL;DR
The paper tackles the scalability gap in neural PDE solvers for industrial-scale simulations by introducing MSPT, a multi-scale patch transformer. It combines local patch self-attention with global context via pooled supernodes using ball-tree partitioning to handle irregular geometries, achieving near-linear scaling with the number of points. MSPT delivers state-of-the-art accuracy on standard PDE benchmarks and large CFD datasets (ShapeNet-Car, AhmedML) while reducing memory and compute costs compared to prior transformer-based solvers. The work includes thorough ablations on patch count and pooling, plus efficiency analyses, highlighting MSPT's potential for million-point on-device inference and real-time design optimization.
Abstract
A key scalability challenge in neural solvers for industrial-scale physics simulations is efficiently capturing both fine-grained local interactions and long-range global dependencies across millions of spatial elements. We introduce the Multi-Scale Patch Transformer (MSPT), an architecture that combines local point attention within patches with global attention to coarse patch-level representations. To partition the input domain into spatially-coherent patches, we employ ball trees, which handle irregular geometries efficiently. This dual-scale design enables MSPT to scale to millions of points on a single GPU. We validate our method on standard PDE benchmarks (elasticity, plasticity, fluid dynamics, porous flow) and large-scale aerodynamic datasets (ShapeNet-Car, Ahmed-ML), achieving state-of-the-art accuracy with substantially lower memory footprint and computational cost.
