Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs
Chun-Wun Cheng, Jiahao Huang, Yi Zhang, Guang Yang, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero
TL;DR
The paper tackles the challenge of efficiently solving parametric PDEs and the limitations of Transformer-based approaches in representing continuous dynamics. It introduces the Mamba Neural Operator (MNO), a framework that unifies structured state-space models (SSMs) with neural operators, providing a theoretical bridge that enables improved long-range dependency modeling and continuous dynamics capture. Through rigorous analysis and extensive experiments on PDE benchmarks (e.g., Darcy Flow, Shallow Water, Diffusion-Reaction), MNO consistently enhances expressive power and accuracy across Transformer-based baselines, with notable gains for Galerkin-type attention and competitive performance for OFormer. The findings suggest that MNO is not merely complementary to Transformers but a superior, scalable framework for PDE-related tasks, offering better stability, data efficiency, and generalization for complex spatiotemporal dynamics. This work has practical implications for mesh-free PDE solvers and high-fidelity simulations where long-range interactions and continuous dynamics are critical.
Abstract
Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to capture intricate dependencies. However, they struggle with representing continuous dynamics and long-range interactions. To overcome these limitations, we introduce the Mamba Neural Operator (MNO), a novel framework that enhances neural operator-based techniques for solving PDEs. MNO establishes a formal theoretical connection between structured state-space models (SSMs) and neural operators, offering a unified structure that can adapt to diverse architectures, including Transformer-based models. By leveraging the structured design of SSMs, MNO captures long-range dependencies and continuous dynamics more effectively than traditional Transformers. Through extensive analysis, we show that MNO significantly boosts the expressive power and accuracy of neural operators, making it not just a complement but a superior framework for PDE-related tasks, bridging the gap between efficient representation and accurate solution approximation.
