On the Benefits of Memory for Modeling Time-Dependent PDEs

Ricardo Buitrago Ruiz; Tanya Marwah; Albert Gu; Andrej Risteski

On the Benefits of Memory for Modeling Time-Dependent PDEs

Ricardo Buitrago Ruiz, Tanya Marwah, Albert Gu, Andrej Risteski

TL;DR

The paper addresses the challenge of modeling time-dependent PDEs when observations are partial or noisy, arguing that memory of past states can significantly improve predictions in such regimes. It introduces Memory Neural Operator (MemNO), a modular framework that combines a Markovian spatial operator with a memory layer (exemplified by S4) to capture temporal dependencies, and instantiates it as S4FFNO. Theoretical motivation shows memory terms can have arbitrarily large impact in idealized linear settings, while empirical results demonstrate memory-based models outperform memoryless baselines by up to 6x error reduction in low-resolution and high-frequency scenarios, with robust improvements under observation noise in 2D Navier–Stokes. The findings suggest memory-augmented neural operators are particularly valuable for PDE benchmarks with substantial high-frequency content and incomplete observations, enabling more accurate and efficient data-driven solvers in practical settings.

Abstract

Data-driven techniques have emerged as a promising alternative to traditional numerical methods for solving PDEs. For time-dependent PDEs, many approaches are Markovian -- the evolution of the trained system only depends on the current state, and not the past states. In this work, we investigate the benefits of using memory for modeling time-dependent PDEs: that is, when past states are explicitly used to predict the future. Motivated by the Mori-Zwanzig theory of model reduction, we theoretically exhibit examples of simple (even linear) PDEs, in which a solution that uses memory is arbitrarily better than a Markovian solution. Additionally, we introduce Memory Neural Operator (MemNO), a neural operator architecture that combines recent state space models (specifically, S4) and Fourier Neural Operators (FNOs) to effectively model memory. We empirically demonstrate that when the PDEs are supplied in low resolution or contain observation noise at train and test time, MemNO significantly outperforms the baselines without memory -- with up to 6x reduction in test error. Furthermore, we show that this benefit is particularly pronounced when the PDE solutions have significant high-frequency Fourier modes (e.g., low-viscosity fluid dynamics) and we construct a challenging benchmark dataset consisting of such PDEs.

On the Benefits of Memory for Modeling Time-Dependent PDEs

TL;DR

Abstract

Paper Structure (40 sections, 5 theorems, 42 equations, 11 figures, 6 tables)

This paper contains 40 sections, 5 theorems, 42 equations, 11 figures, 6 tables.

Introduction
Related Work
Preliminaries
Partial Differential Equations (PDEs)
Mori-Zwanzig Formalism
Theoretical Motivation for Memory: a Simple Example
Experimental Setup
Dataset Generation
PDEs with high-frequency Fourier modes:
Datasets with different resolutions:
Training and Evaluation Procedure
Architecture Framework: Memory Neural Operator
Instantiating the Memory Neural Operator framework: S4FFNO
Memory Helps in Low-Resolution and Input Noise: a Case Study
Kuramoto–Sivashinsky Equation (1D): Study in Low-Resolution
...and 25 more sections

Key Result

Proposition 1

Let $\mathcal{L}: L^2(T;\mathbb{R}) \to L^2(T;\mathbb{R})$ be defined as $\mathcal{L} u(x) = -\Delta u(x) + B \cdot (e^{-ix} + e^{ix}) u(x)$ for $B > 0$. Then, we have:

Figures (11)

Figure 1: Diagram of the MemNO framework in 1D (Section \ref{['sec:MemNO']}). $S$ denotes spatial dimension, $H$ denotes hidden dimension, and $L$ number of layers. The memory layer in inserted in the middle of the spatial layers, although the framework works with other configurations (see Appendix \ref{['appendix:memory_layer_configurations']}).
Figure 2: (First row) nRMSE for several models in the KS dataset at different resolutions, where each column is a different viscosity. The final time is $T=2.5s$ and there are $N_t=25$ timesteps. (Second row) A visualization of the whole frequency spectrum at each of the 25 timesteps for a single trajectory in the dataset. The spectrum is obtained with the ground truth solution at resolution 512.
Figure 3: nRMSE for S4FFNO with varying memory window length, for the KS experiment with $\nu=0.1$ and resolution 32. A memory window of $K$ means that the S4 model only has access to the memory of the last $K$ timesteps to predict the next one. At training time, the sequence length is split into chunks of $K$ timesteps and each chunk is trained independently. At inference time, the S4FFNO is given access to the last K predicted timesteps to make the next prediction.
Figure 4: $\nu=10^{-3}$, $T=16s$, $N_t=32$
Figure 5: $\nu=10^{-5}$, $T=3.2s$, $N_t=32$
...and 6 more figures

Theorems & Definitions (17)

Definition 1: Space of square integrable functions
Definition 2: Time-Dependent PDE
Definition 3: Equispaced grid with resolution $f$
Definition 4: Basis for 2$\pi$-periodic functions
Definition 5: Fourier truncation measurement
Proposition 1
Proposition 2
Theorem 1: Effect of memory
Remark 1
Remark 2
...and 7 more

On the Benefits of Memory for Modeling Time-Dependent PDEs

TL;DR

Abstract

On the Benefits of Memory for Modeling Time-Dependent PDEs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (17)