Table of Contents
Fetching ...

Latent Mamba Operator for Partial Differential Equations

Karn Tiwari, Niladri Dutta, N M Anoop Krishnan, Prathosh A P

TL;DR

Latent Mamba Operator (LaMO) introduces a latent-space, bidirectional state-space model framework for neural operators to solve parametric PDEs with high efficiency and accuracy. By encoding PDE inputs into latent tokens, applying latent SSMs to learn data-dependent kernel integrals, and decoding back to the physical domain, LaMO achieves linear computational complexity in the number of mesh points and delivers state-of-the-art performance across diverse benchmarks. The paper provides theoretical connections between SSMs and kernel-integral operators, and demonstrates substantial empirical gains (average 32.3% improvement over SOTA baselines) on regular grids, structured meshes, and point clouds, including challenging time-dependent flows. These results suggest LaMO’s potential as a scalable foundation model for SciML PDE solving, with future work focusing on pretraining, unsupervised learning, and enhanced scanning strategies.

Abstract

Neural operators have emerged as powerful data-driven frameworks for solving Partial Differential Equations (PDEs), offering significant speedups over numerical methods. However, existing neural operators struggle with scalability in high-dimensional spaces, incur high computational costs, and face challenges in capturing continuous and long-range dependencies in PDE dynamics. To address these limitations, we introduce the Latent Mamba Operator (LaMO), which integrates the efficiency of state-space models (SSMs) in latent space with the expressive power of kernel integral formulations in neural operators. We also establish a theoretical connection between state-space models (SSMs) and the kernel integral of neural operators. Extensive experiments across diverse PDE benchmarks on regular grids, structured meshes, and point clouds covering solid and fluid physics datasets, LaMOs achieve consistent state-of-the-art (SOTA) performance, with a 32.3% improvement over existing baselines in solution operator approximation, highlighting its efficacy in modeling complex PDE solutions.

Latent Mamba Operator for Partial Differential Equations

TL;DR

Latent Mamba Operator (LaMO) introduces a latent-space, bidirectional state-space model framework for neural operators to solve parametric PDEs with high efficiency and accuracy. By encoding PDE inputs into latent tokens, applying latent SSMs to learn data-dependent kernel integrals, and decoding back to the physical domain, LaMO achieves linear computational complexity in the number of mesh points and delivers state-of-the-art performance across diverse benchmarks. The paper provides theoretical connections between SSMs and kernel-integral operators, and demonstrates substantial empirical gains (average 32.3% improvement over SOTA baselines) on regular grids, structured meshes, and point clouds, including challenging time-dependent flows. These results suggest LaMO’s potential as a scalable foundation model for SciML PDE solving, with future work focusing on pretraining, unsupervised learning, and enhanced scanning strategies.

Abstract

Neural operators have emerged as powerful data-driven frameworks for solving Partial Differential Equations (PDEs), offering significant speedups over numerical methods. However, existing neural operators struggle with scalability in high-dimensional spaces, incur high computational costs, and face challenges in capturing continuous and long-range dependencies in PDE dynamics. To address these limitations, we introduce the Latent Mamba Operator (LaMO), which integrates the efficiency of state-space models (SSMs) in latent space with the expressive power of kernel integral formulations in neural operators. We also establish a theoretical connection between state-space models (SSMs) and the kernel integral of neural operators. Extensive experiments across diverse PDE benchmarks on regular grids, structured meshes, and point clouds covering solid and fluid physics datasets, LaMOs achieve consistent state-of-the-art (SOTA) performance, with a 32.3% improvement over existing baselines in solution operator approximation, highlighting its efficacy in modeling complex PDE solutions.

Paper Structure

This paper contains 34 sections, 9 theorems, 88 equations, 13 figures, 11 tables, 1 algorithm.

Key Result

Proposition 3.3

The Zero-Order Hold (ZOH) discretization of continuous parameters is expressed as follows: where $\Delta$ denotes the time step, $\mathbf{I}$ is the identity matrix, and $\mathbf{A}, \mathbf{B}$ are continuous system matrices. The discretization method aligns with the Euler method, approximating the matrix exponential by truncating its Taylor series expansion to the first-order term.

Figures (13)

  • Figure 1: Overview. (1) The input function $a(x)$ is lifted to a higher-dimensional representation using the lifting operator $\mathcal{P}$. (2) The encoder $\mathcal{E}$ maps the input from the physical domain to the latent domain, where the latent block (Bottom Left) performs the kernel integral via SSMs, applies channel mixing and decodes the latent tokens back to the physical domain using the decoder $\mathcal{D}$. (3) A multi-headed bidirectional SSM (Bottom Right) is applied across the tokens within the latent SSM block. (4) This process is repeated $\mathcal{L}$ times. Finally, the channel dimensions are reduced to the desired output size using the projection operator $\mathcal{Q}$, yielding the final output function $u(x)$.
  • Figure 2: Model performance on the scalability of the Darcy Flow benchmark evaluated across various aspects: (Left) Data Efficiency, measuring performance with varying amounts of training data; (Middle Left) Resolution, assessing the impact of different input spatial resolutions; (Middle Right) Model Depth, analyzing performance with increasing layers; and (Right) Embedding Dimension, examining the effect of varying latent space dimensionality. Lower relative $l_2$ error ($\times 10^{-3}$) indicates better performance.
  • Figure 3: Visual Comparison. The (Top Row) displays the ground truth, the (Middle Row) presents the error heatmap of Transolver, and the (Bottom Row) presents the error heatmap for LaMO on (a) Darcy Flow, (b) Plasticity, and (c) Navier-Stokes benchmark.
  • Figure 4: Efficiency comparison of top five baselines on (a) Darcy and (b) Airfoil benchmark per epoch, respectively.
  • Figure 5: The diagram presents an overview of the Neural Operator learning task benchmark PDEs classified as Fluid Physics and Solid Physics. (Top Row) showcases three specific PDEs: Darcy Flow, Airfoil, and Elasticity. (Bottom Row), an additional set of three PDEs is shown: Navier Stokes, Pipe, and Plasticity.
  • ...and 8 more figures

Theorems & Definitions (19)

  • Remark 2.1
  • Remark 3.1: ViT Patches as Special Cases of Latent Tokens alexey2020image
  • Remark 3.2: Multidirectional Scan on Regular Grid
  • Proposition 3.3
  • Theorem 3.4: SSM as an equivalent integral kernel on $\Omega$
  • Lemma 2.1
  • Remark 2.2
  • Proposition 2.4
  • Remark 2.5
  • Corollary 2.6
  • ...and 9 more