Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations
Akshay Kumar, Jarvis Haupt
TL;DR
The paper addresses the non-linear training dynamics of deep $L$-homogeneous networks with $L>2$ under small initialization by linking early weight-direction convergence to non-negative KKT points of a constrained Neural Correlation Function (NCF). It introduces a time-rescaled gradient-flow analysis, showing that, for small $\delta$, the weight norms stay $O(\delta)$ and directions align with KKT points of the constrained NCF (or collapse to zero) within a time horizon scaled by $1/\delta^{L-2}$. Beyond this, it characterizes rank-one KKT points for feed-forward networks with Leaky ReLU and polynomial Leaky ReLU activations, providing necessary and sufficient conditions and confirming them numerically as common in practice. The results offer insight into the emergent low-rank structure observed in early training and lay groundwork for understanding training dynamics in deeper, non-NTK regimes, with ReLU posing a notable future challenge. The work thus bridges theoretical understanding of deep homogeneous networks in the small-initialization regime with empirical observations of rank-one weight structures, potentially informing generalization and initialization strategies.
Abstract
This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. It is shown here that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in (Euclidean) norm and approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of the recently introduced neural correlation function. Additionally, this paper also studies the KKT points of the neural correlation function for feed-forward networks with (Leaky) ReLU and polynomial (Leaky) ReLU activations, deriving necessary and sufficient conditions for rank-one KKT points.
