Training neural operators to preserve invariant measures of chaotic attractors

Ruoxi Jiang; Peter Y. Lu; Elena Orlova; Rebecca Willett

Training neural operators to preserve invariant measures of chaotic attractors

Ruoxi Jiang, Peter Y. Lu, Elena Orlova, Rebecca Willett

TL;DR

The paper tackles the problem of unstable long-horizon forecasting for chaotic dynamics by shifting the learning objective from trajectory RMSE to preserving invariant measures and time-invariant statistics on chaotic attractors. It introduces two complementary training paradigms in a multi-environment setting: (i) a physics-informed optimal transport loss that matches distributions of carefully chosen summary statistics using the entropy-regularized 2-Wasserstein distance $W^\gamma$ (Sinkhorn) and a bias-corrected Sinkhorn divergence, and (ii) a contrastive feature-learning loss that automatically extracts invariant statistics with an encoder via InfoNCE; both are combined with short-term RMSE. Empirical results on Lorenz-96 and Kuramoto–Sivashinsky show that OT and CL approaches yield substantially better long-term statistical fidelity (histograms, energy spectra, Lyapunov-related metrics) than RMSE alone, while maintaining competitive short-term forecasts. This work advances practical neural-operator emulation for chaotic systems, with potential impacts on climate modeling and turbulence where capturing long-term statistical structure is essential.

Abstract

Chaotic systems make long-horizon forecasts difficult because small perturbations in initial conditions cause trajectories to diverge at an exponential rate. In this setting, neural operators trained to minimize squared error losses, while capable of accurate short-term forecasts, often fail to reproduce statistical or structural properties of the dynamics over longer time horizons and can yield degenerate results. In this paper, we propose an alternative framework designed to preserve invariant measures of chaotic attractors that characterize the time-invariant statistical properties of the dynamics. Specifically, in the multi-environment setting (where each sample trajectory is governed by slightly different dynamics), we consider two novel approaches to training with noisy data. First, we propose a loss based on the optimal transport distance between the observed dynamics and the neural operator outputs. This approach requires expert knowledge of the underlying physics to determine what statistical features should be included in the optimal transport loss. Second, we show that a contrastive learning framework, which does not require any specialized prior knowledge, can preserve statistical properties of the dynamics nearly as well as the optimal transport approach. On a variety of chaotic systems, our method is shown empirically to preserve invariant measures of chaotic attractors.

Training neural operators to preserve invariant measures of chaotic attractors

TL;DR

(Sinkhorn) and a bias-corrected Sinkhorn divergence, and (ii) a contrastive feature-learning loss that automatically extracts invariant statistics with an encoder via InfoNCE; both are combined with short-term RMSE. Empirical results on Lorenz-96 and Kuramoto–Sivashinsky show that OT and CL approaches yield substantially better long-term statistical fidelity (histograms, energy spectra, Lyapunov-related metrics) than RMSE alone, while maintaining competitive short-term forecasts. This work advances practical neural-operator emulation for chaotic systems, with potential impacts on climate modeling and turbulence where capturing long-term statistical structure is essential.

Abstract

Paper Structure (29 sections, 24 equations, 7 figures, 13 tables)

This paper contains 29 sections, 24 equations, 7 figures, 13 tables.

Introduction
Contributions
Related work
Problem Formulation
Chaotic dynamical systems and invariant measures of chaotic attractors.
Proposed Approaches
Physics-informed optimal transport
Contrastive feature learning
Experiments
Lorenz-96
Kuramoto--Sivashinsky
Discussion and Limitations
Additional Discussion
Contrastive feature learning vs. physics-informed optimal transport
Interactions and trade-offs between short-term prediction and long-term statistics
...and 14 more sections

Figures (7)

Figure 1: The impact of noise on invariant statistics vs. RMSE and Sobolev norm. (a) We show the impact of noise on various error metrics using ground truth simulations of the chaotic Kuramoto–Sivashinsky (KS) system with increasingly noisy initial conditions $\mathbf{U}_G(\mathbf{u}_0 + \eta)$ as well as with added measurement noise $\mathbf{U}_G(\mathbf{u}_0 + \eta) + \eta$. Here, $\mathbf{U}_G(\cdot)$ refers to the ground truth solution to the differential equation (\ref{['eq:system']}) for the KS system given an initial condition, and $\eta \sim \mathcal{N}(0, r^2 \sigma^2 I)$, where $\sigma^2$ is the temporal variance of the trajectory $\mathbf{U}_G(\mathbf{u}_0)$ and $r$ is the noise scale. Relative RMSE and Sobolev norm li2022learning, which focus on short-term forecasts, deteriorate rapidly with noise $\eta$, whereas the invariant statistics have a much more gradual response to noise, indicating robustness. (b) The emulator trained with only RMSE degenerates at times into striped patches, while ours is much more statistically consistent with the ground truth. (c) Again, the emulator trained with only RMSE performs the worst in terms of capturing the expected energy spectrum over a long-term prediction.
Figure 2: Our proposed approaches for training neural operators. (a) Neural operators are emulators trained to take an initial state and output future states in a recurrent fashion. To ensure the neural operator respects the statistical properties of chaotic dynamics when trained on noisy data, we propose two additional loss functions for matching relevant long-term statistics. (b) We match the distribution of summary statistics, chosen based on prior knowledge, between the emulator predictions and noisy data using an optimal transport loss. (c) In the absence of prior knowledge, we take advantage of self-supervised contrastive learning to automatically learn relevant time-invariant statistics, which can then be used to train neural operators.
Figure 3: Sampled emulator dynamics and summary statistic distributions. We evaluate our proposed approaches by comparing them to a baseline model that is trained solely using relative RMSE loss. We conduct this comparison on two dynamical systems: (a) Lorenz-96 and (b) Kuramoto--Sivashinsky (KS). For each system, we show a visual comparison of the predicted dynamics (left) and two-dimensional histograms of relevant statistics (middle and right). We observe that training the neural operator with our proposed optimal transport (OT) or contrastive learning (CL) loss significantly enhances the long-term statistical properties of the emulator, as seen in the raw emulator dynamics and summary statistic distributions. The performance of the CL loss, which uses no prior knowledge, is comparable to that of the OT loss, which requires an explicit choice of summary statistics.
Figure 4: The trend of feature loss with its weight $\lambda$ when the scale of noise is $r=0.3$. The solid lines in the figure represent the evaluation metrics during the validation phase, comparing the outputs of the neural operator to the noisy data. In contrast, the dashed lines represent the actual metrics we are interested in, comparing the outputs of the neural operator to the clean data and calculating the error of the invariant statistics. In addition, the horizontal solid dashed line correspond to the bar we set for the RMSE, i.e., $110\%$ of the RMSE when $\lambda = 0$. We observe that, (1) with the increase of $\lambda$ from 0, the feature loss decreases until $\lambda$ reaches 1.0. (2) The RMSE generally increases with the increase of $\lambda$. (3) The unseen statistical error generally decreases with the increase of $\lambda$. We reported the results when $\lambda = 0.8$ as our final result, since the further increase of $\lambda$ does not bring further benefit in decreasing the feature loss, and the result remains in an acceptable range in terms of RMSE.
Figure 5: Visualization of the predictions when the noise Level $r=0.1$. We evaluate our method by comparing them to the baseline that is trained solely using RMSE. For two different instances (a) and (b), we visualize the visual comparison of the predicted dynamics (left), two-dimensional histograms of relevant statistics (middle and right). We notice that, with the minimal noise, the predictions obtained from all methods look statistically consistent to the true dynamics.
...and 2 more figures

Training neural operators to preserve invariant measures of chaotic attractors

TL;DR

Abstract

Training neural operators to preserve invariant measures of chaotic attractors

Authors

TL;DR

Abstract

Table of Contents

Figures (7)