State-space models are accurate and efficient neural operators for dynamical systems

Zheyuan Hu; Nazanin Ahmadi Daryakenari; Qianli Shen; Kenji Kawaguchi; George Em Karniadakis

State-space models are accurate and efficient neural operators for dynamical systems

Zheyuan Hu, Nazanin Ahmadi Daryakenari, Qianli Shen, Kenji Kawaguchi, George Em Karniadakis

TL;DR

The paper addresses the challenge of accurately and efficiently learning operators for dynamical systems, especially under long-time integration and extrapolation, by introducing Mamba, a state-space model that dynamically captures long-range dependencies with linear-time inference. Through extensive benchmarks spanning 1D ODEs, discontinuous dynamics, long-time integration, chaotic systems, and a real-world PK-PD application, Mamba consistently matches or outperforms strong baselines (RNNs, Transformers, neural operators) while achieving lower computational costs. The results demonstrate Mamba's robustness in interpolation and extrapolation, its scalability to long sequences, and its applicability to data-scarce real-world problems when combined with physics information. Overall, Mamba positions state-space modeling as a powerful, efficient framework for scientific machine learning in dynamical-systems modeling, with clear potential for PDE extensions and parameter-inference tasks in pharmacology and beyond.

Abstract

Physics-informed machine learning (PIML) has emerged as a promising alternative to classical methods for predicting dynamical systems, offering faster and more generalizable solutions. However, existing models, including recurrent neural networks (RNNs), transformers, and neural operators, face challenges such as long-time integration, long-range dependencies, chaotic dynamics, and extrapolation, to name a few. To this end, this paper introduces state-space models implemented in Mamba for accurate and efficient dynamical system operator learning. Mamba addresses the limitations of existing architectures by dynamically capturing long-range dependencies and enhancing computational efficiency through reparameterization techniques. To extensively test Mamba and compare against another 11 baselines, we introduce several strict extrapolation testbeds that go beyond the standard interpolation benchmarks. We demonstrate Mamba's superior performance in both interpolation and challenging extrapolation tasks. Mamba consistently ranks among the top models while maintaining the lowest computational cost and exceptional extrapolation capabilities. Moreover, we demonstrate the good performance of Mamba for a real-world application in quantitative systems pharmacology for assessing the efficacy of drugs in tumor growth under limited data scenarios. Taken together, our findings highlight Mamba's potential as a powerful tool for advancing scientific machine learning in dynamical systems modeling. (The code will be available at https://github.com/zheyuanhu01/State_Space_Model_Neural_Operator upon acceptance.)

State-space models are accurate and efficient neural operators for dynamical systems

TL;DR

Abstract

Paper Structure (35 sections, 26 equations, 17 figures, 16 tables)

This paper contains 35 sections, 26 equations, 17 figures, 16 tables.

Introduction
Related Work
Neural Operators
Transformers
State-Space Models and Mamba
Methods for Comparative Study
Notation and Problem Definition
State Space Model (SSM) and Mamba
Transformers
Recurrent Neural Networks
Neural Operators
Application of Mamba to dynamical systems
Computational Experiments
1D Dynamical Systems from DeepONet Benchmarks
Finite Regularity and Discontinuous Solutions
...and 20 more sections

Figures (17)

Figure 1: The Mamba block architecture is constructed based on SSM's mathematical formulation in equation (\ref{['eq:ssm_continuous']}) corresponding to the blue SSM block (Sequence Transformation) plus other additional components to strengthen the model. The figure is based on Figure 3 (right) following Gu and Dao gu2023mamba. Specifically, a Mamba block contains two branches. The left branch is SSM-related. It first uses a green linear projection to map the input sequence's each time step to have more feature channels. Then, a blue one-dimensional convolution block (Conv), a nonlinear activation $\sigma$, and finally, the SSM block are applied sequentially. The SSM block intakes the input sequence and model parameters $\boldsymbol{A}, \boldsymbol{B}, \boldsymbol{C}$ for computation. Furthermore, another right branch is the skip connection he2016deep branch, which is a linear projection followed by a nonlinear activation. The results from the two branches are multiplied and linearly transformed into the final output.
Figure 2: Visualization of various models' loss trajectories for the 1D Pendulum system following Lu et al. lu2019deeponet corresponding to Section \ref{['sec:1D_DS_DeepONet']} in this paper. The detailed quantitative results for this setting are presented in Table \ref{['table:1D_DS']}. Subfigure (a) is Mamba versus RNNs; subfigure (b) is Mamba versus NOs; and subfigure (c) is Mamba versus Transformers.
Figure 3: Visualization of Mamba's prediction in finite regularity solutions adopted from Shih et al. shih2024transformers corresponding to Section \ref{['sec:finite_regularity']} in this paper. The detailed quantitative results for this setting have been presented in Table \ref{['tab:discontinuous_sol']}. First row: subfigure (a) is the Izhikevich model with test data number 0, subfigure (b) is the Izhikevich model with test data number 5, and subfigure (c) is the Izhikevich model with test data number 7. Second row: subfigure (d) is the LIF model with test data number 0, subfigure (e) is the LIF model with test data number 2, and subfigure (f) is the LIF model with test data number 4.
Figure 4: Visualization of top three models' loss trajectories for operators with finite regularity following Shih et al. shih2024transformers corresponding to Section \ref{['sec:finite_regularity']} in this paper. The detailed quantitative results for this setting are presented in Table \ref{['tab:discontinuous_sol']}. Subfigure (a) is the Izhikevich model; subfigure (b) is the LIF model.
Figure 5: Visualization of Mamba's prediction on one test data point in the six test cases following LNO cao2023lno corresponding to Section \ref{['sec:lnoode']} in this paper. The quantitative results are presented in Table \ref{['tab:Lorenz']}. Mamba's relative $L_2$ error with respect to time is plotted in Figure \ref{['fig:lno_rel_error_t']}. Subfigure (a): Lorenz system with $\rho = 5$. Subfigure (b): Lorenz system with $\rho = 10$. Subfigure (c): Duffing oscillator with $c = 0$. Subfigure (d): Duffing oscillator with $c = 0.5$. Subfigure (e): Pendulum with $c = 0$. Subfigure (f): Pendulum with $c = 0.5$.
...and 12 more figures

State-space models are accurate and efficient neural operators for dynamical systems

TL;DR

Abstract

State-space models are accurate and efficient neural operators for dynamical systems

Authors

TL;DR

Abstract

Table of Contents

Figures (17)