Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
Ichiro Hashimoto
TL;DR
The paper tackles benign overfitting in fixed-width leaky ReLU two-layer networks trained by gradient descent on mixture data with exponential loss. It introduces directional convergence for gradient descent and a precise characterization of the convergent direction, enabling a linear decision boundary even outside the near-orthogonal/sub-Gaussian regimes that previous work relied on. By deriving both upper and lower error bounds and identifying a phase transition in performance as a function of the signal strength, the work broadens the settings in which benign overfitting can occur and shows when it may provably fail (e.g., Gaussian mixtures in certain regimes). The results hold under deterministic data-conditions verifiable with high probability in over-parameterized regimes and extend to heavier-tailed mixtures, offering a meaningful advance beyond the NTK/lazy-learning paradigm.
Abstract
In this paper, we study benign overfitting of fixed width leaky ReLU two-layer neural network classifiers trained on mixture data via gradient descent. We provide both, upper and lower classification error bounds, and discover a phase transition in the bound as a function of signal strength. The lower bound leads to a characterization of cases when benign overfitting provably fails even if directional convergence occurs. Our analysis allows us to considerably relax the distributional assumptions that are made in existing work on benign overfitting of leaky ReLU two-layer neural network classifiers. We can allow for non-sub-Gaussian data and do not require near orthogonality. Our results are derived by establishing directional convergence of the network parameters and studying classification error bounds for the convergent direction. Previously, directional convergence in (leaky) ReLU neural networks was established only for gradient flow. By first establishing directional convergence, we are able to study benign overfitting of fixed width leaky ReLU two-layer neural network classifiers in a much wider range of scenarios than was done before.
