Nonlinear Dynamics In Optimization Landscape of Shallow Neural Networks with Tunable Leaky ReLU
Jingzhou Liu
TL;DR
This paper analyzes the nonlinear dynamics of a shallow two-layer teacher–student neural network with Gaussian inputs and leaky ReLU activation. It develops a framework based on the $G$-equivariant gradient degree to detect symmetry-bearing bifurcations of critical points from the global minimum as the leaky slope $\alpha$ varies, proving width-invariant bifurcation thresholds and a multi-mode degeneracy at $\alpha=0$. The main result shows that branches of nontrivial critical points bifurcate at three critical values in $\Lambda$ with four $S_k$-Specht symmetry types, and that in the engineering regime $\alpha\in(0,1)$ these bifurcations are subcritical, preserving symmetry; a detailed $k=5$ numerical example illustrates the four possible symmetry types. Overall, the work clarifies how intrinsic permutation symmetries constrain the optimization landscape of wide shallow networks and provides a predictive, symmetry-based lens for gradient dynamics in such models.
Abstract
In this work, we study the nonlinear dynamics of a shallow neural network trained with mean-squared loss and leaky ReLU activation. Under Gaussian inputs and equal layer width k, (1) we establish, based on the equivariant gradient degree, a theoretical framework, applicable to any number of neurons k>= 4, to detect bifurcation of critical points with associated symmetries from global minimum as leaky parameter $α$ varies. Typically, our analysis reveals that a multi-mode degeneracy consistently occurs at the critical number 0, independent of k. (2) As a by-product, we further show that such bifurcations are width-independent, arise only for nonnegative $α$ and that the global minimum undergoes no further symmetry-breaking instability throughout the engineering regime $α$ in range (0,1). An explicit example with k=5 is presented to illustrate the framework and exhibit the resulting bifurcation together with their symmetries.
