Table of Contents
Fetching ...

On instabilities in neural network-based physics simulators

Daniel Floryan

TL;DR

The paper addresses the problem of long-time instability in neural network–based physics simulators by analyzing the training dynamics for linear dynamical systems learned via gradient descent under a mean-squared error loss. It shows that instability arises from nonuniform energy distribution across data directions, unlearnable directions associated with zero-energy modes, and initialization-dependent learning, with distinct behavior in discrete-time versus continuous-time settings. It demonstrates that measurement noise can stabilize learned dynamics by injecting damping, albeit at the cost of bias, and discusses mitigation strategies such as projecting onto the data subspace and designing stable initializations (e.g., Gershgorin-based schemes). These insights extend to nonlinear systems empirically and offer practical guidance for data processing, initialization, and noise strategies to improve long-time stability of physics simulators.

Abstract

When neural networks are trained from data to simulate the dynamics of physical systems, they encounter a persistent challenge: the long-time dynamics they produce are often unphysical or unstable. We analyze the origin of such instabilities when learning linear dynamical systems, focusing on the training dynamics. We make several analytical findings which empirical observations suggest extend to nonlinear dynamical systems. First, the rate of convergence of the training dynamics is uneven and depends on the distribution of energy in the data. As a special case, the dynamics in directions where the data have no energy cannot be learned. Second, in the unlearnable directions, the dynamics produced by the neural network depend on the weight initialization, and common weight initialization schemes can produce unstable dynamics. Third, injecting synthetic noise into the data during training adds damping to the training dynamics and can stabilize the learned simulator, though doing so undesirably biases the learned dynamics. For each contributor to instability, we suggest mitigative strategies. We also highlight important differences between learning discrete-time and continuous-time dynamics, and discuss extensions to nonlinear systems.

On instabilities in neural network-based physics simulators

TL;DR

The paper addresses the problem of long-time instability in neural network–based physics simulators by analyzing the training dynamics for linear dynamical systems learned via gradient descent under a mean-squared error loss. It shows that instability arises from nonuniform energy distribution across data directions, unlearnable directions associated with zero-energy modes, and initialization-dependent learning, with distinct behavior in discrete-time versus continuous-time settings. It demonstrates that measurement noise can stabilize learned dynamics by injecting damping, albeit at the cost of bias, and discusses mitigation strategies such as projecting onto the data subspace and designing stable initializations (e.g., Gershgorin-based schemes). These insights extend to nonlinear systems empirically and offer practical guidance for data processing, initialization, and noise strategies to improve long-time stability of physics simulators.

Abstract

When neural networks are trained from data to simulate the dynamics of physical systems, they encounter a persistent challenge: the long-time dynamics they produce are often unphysical or unstable. We analyze the origin of such instabilities when learning linear dynamical systems, focusing on the training dynamics. We make several analytical findings which empirical observations suggest extend to nonlinear dynamical systems. First, the rate of convergence of the training dynamics is uneven and depends on the distribution of energy in the data. As a special case, the dynamics in directions where the data have no energy cannot be learned. Second, in the unlearnable directions, the dynamics produced by the neural network depend on the weight initialization, and common weight initialization schemes can produce unstable dynamics. Third, injecting synthetic noise into the data during training adds damping to the training dynamics and can stabilize the learned simulator, though doing so undesirably biases the learned dynamics. For each contributor to instability, we suggest mitigative strategies. We also highlight important differences between learning discrete-time and continuous-time dynamics, and discuss extensions to nonlinear systems.
Paper Structure (5 sections, 23 equations, 3 figures)

This paper contains 5 sections, 23 equations, 3 figures.

Figures (3)

  • Figure 1: Histogram of eigenvalues of an $n \times n$ matrix whose entries are generated using the Glorot normal initializer. $10^5/n$ realizations of the random matrix were used. The unit circle is drawn with a dashed cyan line. $\phi$ gives the fraction of eigenvalues outside of the unit circle. The Glorot uniform initializer produces nearly identical histograms.
  • Figure 2: Histogram of eigenvalues of an $n \times n$ matrix whose entries are generated using the initializer based on Gershgorin's circle theorem. $10^5/n$ realizations of the random matrix were used. The unit circle is drawn with a dashed black line.
  • Figure 3: Training dynamics of a three-dimensional system. The real parts of the eigenvalues are shown. The high-energy mode converges more quickly (blue; in time $\tau_{\text{energy1}}$) than the low-energy mode (red; in time $\tau_{\text{energy2}}$). Noise stabilizes the unlearnable and otherwise unstable mode (green), but biases the learnable dynamics.