Dynamical stability and chaos in artificial neural network trajectories along training
Kaloyan Danovski, Miguel C. Soriano, Lucas Lacasa
TL;DR
This work treats neural network training as a discrete-time dynamical system in graph (weight) space and analyzes how learning-rate choices shape dynamical and orbital stability of network trajectories. By studying a shallow network on the Iris task, it uncovers a low-$\eta$ regime with non-monotonic, non-chaotic trajectory divergence and evidence for marginal stability due to flat, high-dimensional loss basins. In contrast, larger learning rates reveal an edge-of-stability with positive finite-time Lyapunov exponents and non-monotonic loss, and at very large rates a chaotic-intermittent regime with complex weight dynamics. The findings challenge naive convergence expectations, motivate a cross-disciplinary view combining dynamical-systems tools with ML practice, and suggest further exploration of regularization and architecture-dependent stability across tasks.
Abstract
The process of training an artificial neural network involves iteratively adapting its parameters so as to minimize the error of the network's prediction, when confronted with a learning task. This iterative change can be naturally interpreted as a trajectory in network space -- a time series of networks -- and thus the training algorithm (e.g. gradient descent optimization of a suitable loss function) can be interpreted as a dynamical system in graph space. In order to illustrate this interpretation, here we study the dynamical properties of this process by analyzing through this lens the network trajectories of a shallow neural network, and its evolution through learning a simple classification task. We systematically consider different ranges of the learning rate and explore both the dynamical and orbital stability of the resulting network trajectories, finding hints of regular and chaotic behavior depending on the learning rate regime. Our findings are put in contrast to common wisdom on convergence properties of neural networks and dynamical systems theory. This work also contributes to the cross-fertilization of ideas between dynamical systems theory, network theory and machine learning
