On instabilities in neural network-based physics simulators
Daniel Floryan
TL;DR
The paper addresses the problem of long-time instability in neural network–based physics simulators by analyzing the training dynamics for linear dynamical systems learned via gradient descent under a mean-squared error loss. It shows that instability arises from nonuniform energy distribution across data directions, unlearnable directions associated with zero-energy modes, and initialization-dependent learning, with distinct behavior in discrete-time versus continuous-time settings. It demonstrates that measurement noise can stabilize learned dynamics by injecting damping, albeit at the cost of bias, and discusses mitigation strategies such as projecting onto the data subspace and designing stable initializations (e.g., Gershgorin-based schemes). These insights extend to nonlinear systems empirically and offer practical guidance for data processing, initialization, and noise strategies to improve long-time stability of physics simulators.
Abstract
When neural networks are trained from data to simulate the dynamics of physical systems, they encounter a persistent challenge: the long-time dynamics they produce are often unphysical or unstable. We analyze the origin of such instabilities when learning linear dynamical systems, focusing on the training dynamics. We make several analytical findings which empirical observations suggest extend to nonlinear dynamical systems. First, the rate of convergence of the training dynamics is uneven and depends on the distribution of energy in the data. As a special case, the dynamics in directions where the data have no energy cannot be learned. Second, in the unlearnable directions, the dynamics produced by the neural network depend on the weight initialization, and common weight initialization schemes can produce unstable dynamics. Third, injecting synthetic noise into the data during training adds damping to the training dynamics and can stabilize the learned simulator, though doing so undesirably biases the learned dynamics. For each contributor to instability, we suggest mitigative strategies. We also highlight important differences between learning discrete-time and continuous-time dynamics, and discuss extensions to nonlinear systems.
