Directional Convergence Near Small Initializations and Saddles in Two-Homogeneous Neural Networks
Akshay Kumar, Jarvis Haupt
TL;DR
This work analyzes gradient flow dynamics for two-homogeneous neural networks initialized near the origin and shows that, for square and logistic losses, the weights spend substantial time near zero and align in direction with non-negative KKT points of a neural correlation function (NCF). The authors introduce the NCF and prove directional convergence near initialization and near certain saddles using a rigorous framework based on differential inclusions and o-minimal definability for non-smooth losses, with corollaries for separable networks. They discuss higher-order homogeneity and identify open questions for extending results to general L-homogeneous networks. Overall, the results illuminate how small initializations steer early training toward specific directional patterns, offering a principled view of implicit regularization in non-smooth, overparameterized models.
Abstract
This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of small magnitude in the neighborhood of certain saddle points.
