Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

Chun-Wun Cheng; Jiahao Huang; Yi Zhang; Guang Yang; Carola-Bibiane Schönlieb; Angelica I Aviles-Rivero

Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

Chun-Wun Cheng, Jiahao Huang, Yi Zhang, Guang Yang, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

TL;DR

The paper tackles the challenge of efficiently solving parametric PDEs and the limitations of Transformer-based approaches in representing continuous dynamics. It introduces the Mamba Neural Operator (MNO), a framework that unifies structured state-space models (SSMs) with neural operators, providing a theoretical bridge that enables improved long-range dependency modeling and continuous dynamics capture. Through rigorous analysis and extensive experiments on PDE benchmarks (e.g., Darcy Flow, Shallow Water, Diffusion-Reaction), MNO consistently enhances expressive power and accuracy across Transformer-based baselines, with notable gains for Galerkin-type attention and competitive performance for OFormer. The findings suggest that MNO is not merely complementary to Transformers but a superior, scalable framework for PDE-related tasks, offering better stability, data efficiency, and generalization for complex spatiotemporal dynamics. This work has practical implications for mesh-free PDE solvers and high-fidelity simulations where long-range interactions and continuous dynamics are critical.

Abstract

Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to capture intricate dependencies. However, they struggle with representing continuous dynamics and long-range interactions. To overcome these limitations, we introduce the Mamba Neural Operator (MNO), a novel framework that enhances neural operator-based techniques for solving PDEs. MNO establishes a formal theoretical connection between structured state-space models (SSMs) and neural operators, offering a unified structure that can adapt to diverse architectures, including Transformer-based models. By leveraging the structured design of SSMs, MNO captures long-range dependencies and continuous dynamics more effectively than traditional Transformers. Through extensive analysis, we show that MNO significantly boosts the expressive power and accuracy of neural operators, making it not just a complement but a superior framework for PDE-related tasks, bridging the gap between efficient representation and accurate solution approximation.

Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

TL;DR

Abstract

Paper Structure (19 sections, 2 theorems, 25 equations, 7 figures, 4 tables)

This paper contains 19 sections, 2 theorems, 25 equations, 7 figures, 4 tables.

Introduction
Related Work
Mamba Neural Operator
Problem Statement
Preliminaries: Transformer and Mamba
State Space Models Discretisation for PDEs
Network Architecture
Mamba for Neural Operators
Experiments and Discussion
Dataset Description & Implementation Protocol
Darcy Flow.
Shallow Water.
Diffusion Reaction.
Chose Your Winner: Transformer vs. Mamba for PDEs
Why the Winner Wins: Breaking Down Mamba’s Win
...and 4 more sections

Key Result

Proposition 1

The zero-order hold discretisation method, as in Zero, is equivalent to the Euler method in SSM when the Taylor series expansion of the exponential function is truncated to its first-order term.

Figures (7)

Figure 1: (A) Illustration of Mamba Neural Operator. Input image patches are processed by following two distinct scanning paths (referred to as Bidirectional -Scan). Each sequence generated from these paths is passed through separate S6 blocks/ Cross S6 Blocks for independent processing. Afterwards, the outputs from the S6 blocks / Cross S6 Blocks are combined to form a feature map, resulting in the final output (Bidirectional-Merge). (B) and (C) are the detailed block of the S6 Block and Cross S6 Block respectively. The detail network architecture and definition of Cross S6 Block can be found in Appendix A.
Figure 2: Results of prediction map and error map of the GNOT across three versions: Galerkin attention, Softmax attention, and Mamba.
Figure 3: Results of prediction map and error map of the Galerkin Transformer and OFormer across three versions: Galerkin attention, Softmax attention, and Mamba.
Figure 4: Visualised prediction on Shallow Water dataset using Galerkin Transformer (G.T.) across the original and Mamba version.
Figure 5: Visualised prediction on Diffusion Reaction dataset using Galerkin Transformer (G.T.) across the original and Mamba version.
...and 2 more figures

Theorems & Definitions (7)

Proposition 1
proof
Definition 1
Definition 2
Definition 3
Proposition 2
proof

Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

TL;DR

Abstract

Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (7)