Towards Understanding Gradient Flow Dynamics of Homogeneous Neural Networks Beyond the Origin
Akshay Kumar, Jarvis Haupt
TL;DR
The paper analyzes gradient-flow dynamics of $L$-positively homogeneous neural networks in the small-initialization regime, focusing on the phase after escaping the origin. It shows that post-escape trajectories closely follow a limiting path ${\mathbf p}(t)$ guided by a second-order positive KKT point of the Neural Correlation Function, enabling precise characterization of the first encountered saddle. For feed-forward homogeneous networks, sparsity patterns observed before escape are shown to persist after escape under zero-preserving-subset conditions, linking early feature-learning structure to later optimization dynamics. Although the analysis excludes ReLU due to the Lipschitz-gradient assumption, the work provides a rigorous, tractable description of a meaningful segment of gradient flow beyond the origin and offers empirical corroboration via numerical experiments.
Abstract
Recent works exploring the training dynamics of homogeneous neural network weights under gradient flow with small initialization have established that in the early stages of training, the weights remain small and near the origin, but converge in direction. Building on this, the current paper studies the gradient flow dynamics of homogeneous neural networks with locally Lipschitz gradients, after they escape the origin. Insights gained from this analysis are used to characterize the first saddle point encountered by gradient flow after escaping the origin. Also, it is shown that for homogeneous feed-forward neural networks, under certain conditions, the sparsity structure emerging among the weights before the escape is preserved after escaping the origin and until reaching the next saddle point.
