Hidden Minima in Two-Layer ReLU Networks

Yossi Arjevani

Hidden Minima in Two-Layer ReLU Networks

Yossi Arjevani

TL;DR

The paper analyzes the optimization landscape of two-layer ReLU networks under squared loss, revealing two infinite families of spurious minima (type I hidden minima and type II) whose Hessian spectra agree up to O(d^{-1/2}).It develops a framework based on tangency sets, o-minimal definability, and group representation theory to classify critical-point arcs by symmetry (isotropy) and to capture their Puiseux-series structure in 1/d.Key contributions include precise descriptions of tangency-arc structure for type I and II minima, explicit isotypic decompositions of the parameter space, and leading-term Hessian analyses that distinguish hidden from detectable minima via symmetry-breaking arcs.Numerical investigations validate the theoretical predictions by tracing tangency arcs and bounding distances to nearby critical points, illustrating the practical relevance of symmetry-based arguments for understanding nonconvex optimization in neural networks.

Abstract

We consider the optimization problem arising from fitting two-layer ReLU networks with $d$ inputs under the square loss, where labels are generated by a target network. Two infinite families of spurious minima have recently been identified: one whose loss vanishes as $d \to \infty$, and another whose loss remains bounded away from zero. The latter are nevertheless avoided by vanilla SGD, and thus hidden, motivating the search for analytic properties distinguishing the two types. Perhaps surprisingly, the Hessian spectra of hidden and non-hidden minima agree up to terms of order $O(d^{-1/2})$, providing limited explanatory power. Consequently, our analysis of hidden minima proceeds instead via curves along which the loss is minimized or maximized. The main result is that arcs emanating from hidden minima differ, characteristically, by their structure and symmetry, precisely on account of the $O(d^{-1/2})$-eigenvalue terms absent from previous analyses.

Hidden Minima in Two-Layer ReLU Networks

TL;DR

Abstract

We consider the optimization problem arising from fitting two-layer ReLU networks with

inputs under the square loss, where labels are generated by a target network. Two infinite families of spurious minima have recently been identified: one whose loss vanishes as

, and another whose loss remains bounded away from zero. The latter are nevertheless avoided by vanilla SGD, and thus hidden, motivating the search for analytic properties distinguishing the two types. Perhaps surprisingly, the Hessian spectra of hidden and non-hidden minima agree up to terms of order

, providing limited explanatory power. Consequently, our analysis of hidden minima proceeds instead via curves along which the loss is minimized or maximized. The main result is that arcs emanating from hidden minima differ, characteristically, by their structure and symmetry, precisely on account of the

-eigenvalue terms absent from previous analyses.

Paper Structure (20 sections, 6 theorems, 20 equations, 2 tables)

This paper contains 20 sections, 6 theorems, 20 equations, 2 tables.

Introduction
Framework: the tangency set and symmetry
O-minimal theory
Curve Selection Lemma (CSL)
Representation theory of groups
Expressing curves of critical points as Puiseux series in $d$
Main results: structure and symmetry of tangency arcs
Structure and symmetry of tangency arcs of type I and type II minima
Numerical results: bounding the distance to the nearest critical point
Concluding remarks
Acknowledgements and disclosure of funding
The o-minimal structure $\mathbb{R}_{an}$
Proof of lem:tang_eigsp
Monotonicity theorem
Proof of thm:max_iso
...and 5 more sections

Key Result

Corollary 1

If $f$ is definable then there exists a tangency arc $\gamma$ parameterized by arc length satisfying ${\mathcal{L}}(\gamma(r)) = m(r)$, and similarly for $M(r)$.

Theorems & Definitions (10)

Remark 1
Definition 1
Corollary 1
Lemma 1
Theorem 1
Corollary 2
Definition 2
Theorem 2
Definition 3
Lemma 2: arjevanifield2020hessian

Hidden Minima in Two-Layer ReLU Networks

TL;DR

Abstract

Hidden Minima in Two-Layer ReLU Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (10)