Table of Contents
Fetching ...

Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks

Ichiro Hashimoto

TL;DR

The paper tackles benign overfitting in fixed-width leaky ReLU two-layer networks trained by gradient descent on mixture data with exponential loss. It introduces directional convergence for gradient descent and a precise characterization of the convergent direction, enabling a linear decision boundary even outside the near-orthogonal/sub-Gaussian regimes that previous work relied on. By deriving both upper and lower error bounds and identifying a phase transition in performance as a function of the signal strength, the work broadens the settings in which benign overfitting can occur and shows when it may provably fail (e.g., Gaussian mixtures in certain regimes). The results hold under deterministic data-conditions verifiable with high probability in over-parameterized regimes and extend to heavier-tailed mixtures, offering a meaningful advance beyond the NTK/lazy-learning paradigm.

Abstract

In this paper, we study benign overfitting of fixed width leaky ReLU two-layer neural network classifiers trained on mixture data via gradient descent. We provide both, upper and lower classification error bounds, and discover a phase transition in the bound as a function of signal strength. The lower bound leads to a characterization of cases when benign overfitting provably fails even if directional convergence occurs. Our analysis allows us to considerably relax the distributional assumptions that are made in existing work on benign overfitting of leaky ReLU two-layer neural network classifiers. We can allow for non-sub-Gaussian data and do not require near orthogonality. Our results are derived by establishing directional convergence of the network parameters and studying classification error bounds for the convergent direction. Previously, directional convergence in (leaky) ReLU neural networks was established only for gradient flow. By first establishing directional convergence, we are able to study benign overfitting of fixed width leaky ReLU two-layer neural network classifiers in a much wider range of scenarios than was done before.

Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks

TL;DR

The paper tackles benign overfitting in fixed-width leaky ReLU two-layer networks trained by gradient descent on mixture data with exponential loss. It introduces directional convergence for gradient descent and a precise characterization of the convergent direction, enabling a linear decision boundary even outside the near-orthogonal/sub-Gaussian regimes that previous work relied on. By deriving both upper and lower error bounds and identifying a phase transition in performance as a function of the signal strength, the work broadens the settings in which benign overfitting can occur and shows when it may provably fail (e.g., Gaussian mixtures in certain regimes). The results hold under deterministic data-conditions verifiable with high probability in over-parameterized regimes and extend to heavier-tailed mixtures, offering a meaningful advance beyond the NTK/lazy-learning paradigm.

Abstract

In this paper, we study benign overfitting of fixed width leaky ReLU two-layer neural network classifiers trained on mixture data via gradient descent. We provide both, upper and lower classification error bounds, and discover a phase transition in the bound as a function of signal strength. The lower bound leads to a characterization of cases when benign overfitting provably fails even if directional convergence occurs. Our analysis allows us to considerably relax the distributional assumptions that are made in existing work on benign overfitting of leaky ReLU two-layer neural network classifiers. We can allow for non-sub-Gaussian data and do not require near orthogonality. Our results are derived by establishing directional convergence of the network parameters and studying classification error bounds for the convergent direction. Previously, directional convergence in (leaky) ReLU neural networks was established only for gradient flow. By first establishing directional convergence, we are able to study benign overfitting of fixed width leaky ReLU two-layer neural network classifiers in a much wider range of scenarios than was done before.

Paper Structure

This paper contains 23 sections, 35 theorems, 180 equations, 1 table.

Key Result

Theorem 4.8

Suppose event $E$ holds under one of the following conditions: Then, the gradient descent iterate eq:grds keeps all the neurons activated for $t\geq 1$, satisfies $\mathcal{L}(W^{(t)}) = O(t^{-1})$, and converges in direction. Furthermore, the convergent direction $\{\boldsymbol{\hat{w}}_j\}_{j\in J}$ can also be given by $\boldsymbol{\hat{w}}_j = \boldsymbol{w Lastly, the resulting network $f(\c

Theorems & Definitions (65)

  • Theorem 4.8: Directional Convergence on Mixture Data
  • Theorem 5.1: Classification Error Bounds
  • Theorem 6.2
  • Theorem 6.3
  • Lemma 7.1
  • Corollary A.1
  • Lemma B.1
  • proof
  • Remark
  • Lemma B.2
  • ...and 55 more