Convergence of Shallow ReLU Networks on Weakly Interacting Data

Léo Dana; Francis Bach; Loucas Pillaud-Vivien

Convergence of Shallow ReLU Networks on Weakly Interacting Data

Léo Dana, Francis Bach, Loucas Pillaud-Vivien

TL;DR

This work analyzes the training dynamics of a one-hidden-layer ReLU network trained by gradient flow on n data points in a high-dimensional setting with weak input correlations. It establishes that a width of order $p\sim\log(n)$ suffices for global convergence to interpolation with high probability, leveraging a local Polyak-Łojasiewicz framework and high-dimensional concentration. For orthogonal data, it refines the convergence rate, showing the exponential rate lies between $\mathcal{O}(1/n)$ and $\mathcal{O}(1/\sqrt{n})$, and reveals a phase-transition phenomenon in the PL curvature during training. Complementary experiments corroborate the $\mu(t)$ scaling and threshold behaviors, indicating practical implications for designing shallow networks in high-dimensional regimes and informing future extensions to broader data regimes and deeper models.

Abstract

We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on $n$ data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order $\log(n)$ neurons suffices for global convergence with high probability. Our analysis uses a Polyak-Łojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of $\frac{1}{n}$. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders $\frac{1}{n}$ and $\frac{1}{\sqrt{n}}$, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order $\frac{1}{\log(n)}$.

Convergence of Shallow ReLU Networks on Weakly Interacting Data

TL;DR

suffices for global convergence to interpolation with high probability, leveraging a local Polyak-Łojasiewicz framework and high-dimensional concentration. For orthogonal data, it refines the convergence rate, showing the exponential rate lies between

and

, and reveals a phase-transition phenomenon in the PL curvature during training. Complementary experiments corroborate the

scaling and threshold behaviors, indicating practical implications for designing shallow networks in high-dimensional regimes and informing future extensions to broader data regimes and deeper models.

Abstract

We analyse the convergence of one-hidden-layer ReLU networks trained by gradient flow on

data points. Our main contribution leverages the high dimensionality of the ambient space, which implies low correlation of the input samples, to demonstrate that a network with width of order

neurons suffices for global convergence with high probability. Our analysis uses a Polyak-Łojasiewicz viewpoint along the gradient-flow trajectory, which provides an exponential rate of convergence of

. When the data are exactly orthogonal, we give further refined characterizations of the convergence speed, proving its asymptotic behavior lies between the orders

and

, and exhibiting a phase-transition phenomenon in the convergence rate, during which it evolves from the lower bound to the upper, and in a relative time of order

Convergence of Shallow ReLU Networks on Weakly Interacting Data

TL;DR

Abstract

Convergence of Shallow ReLU Networks on Weakly Interacting Data

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (29)