On the weight dynamics of learning networks
Nahal Sharafi, Christoph Martin, Sarah Hallerberg
TL;DR
This work casts the weight dynamics of a three-layer feed-forward network under gradient-based learning as a dynamical system and derives its tangent (Jacobian) operator to enable local stability analysis. By computing FTLEs, CLVs, and LEs, the study links stability in weight space to training outcomes, showing how initialization and activation choices shape the attractor structure and final loss $c_f$. Crucially, early stability indicators—especially FTLEs—can predict whether training will converge to low or high loss regions, offering potential for early stopping or reinitialization. While the results are demonstrated on a specific regression task, the framework and findings illuminate how dynamical-systems perspectives can inform initialization, activation choice, and monitoring strategies in learning dynamics across architectures.
Abstract
Neural networks have become a widely adopted tool for tackling a variety of problems in machine learning and artificial intelligence. In this contribution we use the mathematical framework of local stability analysis to gain a deeper understanding of the learning dynamics of feed forward neural networks. Therefore, we derive equations for the tangent operator of the learning dynamics of three-layer networks learning regression tasks. The results are valid for an arbitrary numbers of nodes and arbitrary choices of activation functions. Applying the results to a network learning a regression task, we investigate numerically, how stability indicators relate to the final training-loss. Although the specific results vary with different choices of initial conditions and activation functions, we demonstrate that it is possible to predict the final training loss, by monitoring finite-time Lyapunov exponents or covariant Lyapunov vectors during the training process.
