Table of Contents
Fetching ...

Towards Infinitely Long Neural Simulations: Self-Refining Neural Surrogate Models for Dynamical Systems

Qi Liu, Laure Zanna, Joan Bruna

Abstract

Recent advances in autoregressive neural surrogate models have enabled orders-of-magnitude speedups in simulating dynamical systems. However, autoregressive models are generally prone to distribution drift: compounding errors in autoregressive rollouts that severely degrade generation quality over long time horizons. Existing work attempts to address this issue by implicitly leveraging the inherent trade-off between short-time accuracy and long-time consistency through hyperparameter tuning. In this work, we introduce a unifying mathematical framework that makes this tradeoff explicit, formalizing and generalizing hyperparameter-based strategies in existing approaches. Within this framework, we propose a robust, hyperparameter-free model implemented as a conditional diffusion model that balances short-time fidelity with long-time consistency by construction. Our model, Self-refining Neural Surrogate model (SNS), can be implemented as a standalone model that refines its own autoregressive outputs or as a complementary model to existing neural surrogates to ensure long-time consistency. We also demonstrate the numerical feasibility of SNS through high-fidelity simulations of complex dynamical systems over arbitrarily long time horizons.

Towards Infinitely Long Neural Simulations: Self-Refining Neural Surrogate Models for Dynamical Systems

Abstract

Recent advances in autoregressive neural surrogate models have enabled orders-of-magnitude speedups in simulating dynamical systems. However, autoregressive models are generally prone to distribution drift: compounding errors in autoregressive rollouts that severely degrade generation quality over long time horizons. Existing work attempts to address this issue by implicitly leveraging the inherent trade-off between short-time accuracy and long-time consistency through hyperparameter tuning. In this work, we introduce a unifying mathematical framework that makes this tradeoff explicit, formalizing and generalizing hyperparameter-based strategies in existing approaches. Within this framework, we propose a robust, hyperparameter-free model implemented as a conditional diffusion model that balances short-time fidelity with long-time consistency by construction. Our model, Self-refining Neural Surrogate model (SNS), can be implemented as a standalone model that refines its own autoregressive outputs or as a complementary model to existing neural surrogates to ensure long-time consistency. We also demonstrate the numerical feasibility of SNS through high-fidelity simulations of complex dynamical systems over arbitrarily long time horizons.
Paper Structure (22 sections, 5 theorems, 56 equations, 9 figures, 2 algorithms)

This paper contains 22 sections, 5 theorems, 56 equations, 9 figures, 2 algorithms.

Key Result

Theorem 3.1

Starting from an initial point $(\mathbf{x}_t^{s_1^0},\mathbf{x}_{t-1}^{s_2^0})$, consider a discretized traversal path in reverse process phase space : $\{(s_1^k,s_2^k)\}_{k=0}^{N}$, with $(s_1^0,s_2^0)$ the starting point and $(s_1^N,s_2^N)=(0,0)$. Assume the path is monotonically decreasing in ea is independent of the particular traversal path in reverse process phase space, i.e for any two mon

Figures (9)

  • Figure 1: Top 5 rows: Vorticity fields of the Kolmogorov flow from a numerical solver, Gaussian approximation of the transition density, ACDM with 200 denoising steps, Thermalizer and SNS refined trajectories for Kolmogorov flow over a trajectory of $15,000$ steps. Bottom row: kinetic energy spectra at each timestep, averaged over $20$ randomly initialized trajectories.
  • Figure 2: Phase space for the forward and reverse process : Moving to the right and moving to the left corresponds to the forward process and the reverse process for $\mathbf{x}_{t-1}$. Moving up and down corresponds to the forward and the reverse process for $\mathbf{x}_t$, respectively. The dashed curved arrows represent a jump in the reverse process.
  • Figure 3: The x-axis corresponds to the diffusion time of the forward process for the conditional variable. The images below the x-axis are realizations of the forward process with the same initial condition $\mathbf{x}_{t-1}^{0}$ at different times $s$. The images in the top row are in direct correspondence with the images below via an estimator of the multi-noise-level denoising oracle$\hat{\mathbf{x}}_t^0 =\mathbf{D}_{\theta}(\mathbf{x}_t^S, \mathbf{x}_{t-1}^{s})$. The y-axis in the $L^2$ distance between the denoised image and the original image. The degradation in denoising accuracy and the smoothing of fields with higher noise injection in the conditional variable align with the theory.
  • Figure 4: Top 6 rows: Top and bottom layer vorticity fields of the two-layer QG system from a numerical solver, Gaussian approximation of the transition density, and SNS refined trajectories for Kolmogorov flow over a trajectory of $20,000$ steps. Bottom 2 rows: kinetic energy spectra of top and bottom layer at each timestep, averaged over $5$ randomly initialized trajectories, with min-max spread shaded.
  • Figure 5: For a trajectory $\mathbf{x}_t \in \mathbb{R}^{d}$, we report two temporal consistency metrics. The spatio-temporal correlation at lag $\tau$ is computed as $C(\tau) = \left\langle \langle \mathbf{x}_t - \bar{\mathbf{x}}_t,\; \mathbf{x}_{t+\tau} - \bar{\mathbf{x}}_{t+\tau} \rangle\|\mathbf{x}_t - \bar{\mathbf{x}}_t\|_2^{-1} \, \|\mathbf{x}_{t+\tau} - \bar{\mathbf{x}}_{t+\tau}\|_2^{-1}\right\rangle_{t,\mathrm{IC}}$ where $\bar{\mathbf{x}}_t$ is the spatial mean of $\mathbf{x}_t$, and $\langle \cdot \rangle_{t,\mathrm{IC}}$ denotes averaging over time and initial conditions. The rate of change is computed from the discrete time derivative, $R(t)=\left\|\mathbf{x}_{t+1} - \mathbf{x}_t/{\Delta t}\right\|_1,$ and we plot the mean of $R(t)$ over different random initial conditions. The blue curves disappear in the bottom left plot for two different reasons. From 0 to 250, the blue curve is not visible because it overlaps with the green one, indicating strong temporal accuracy of the point-wise estimate in the short rollouts. From 30000 to 30250, the blue curve is missing because the point-wise estimates are returning NaN from the blow up in scale when the distribution drifted too far away from the stationary distribution.
  • ...and 4 more figures

Theorems & Definitions (9)

  • Theorem 3.1: Equivalence between monotonic traversal strategies
  • proof
  • Lemma 3.2
  • proof
  • Theorem 3.3: Minimizer of the multi-noise-level DSM objective
  • proof
  • Corollary 3.4: Equivalence with conditional scores
  • proof
  • Corollary 3.5