Table of Contents
Fetching ...

Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds

Anupama Sridhar, Alexander Johansen

TL;DR

This work tackles the long-standing challenge of proving global convergence for Adam in non-smooth ReLU networks by developing a geometric framework based on stratified Morse theory, Whitney stratification, and Kakeya geometry. It introduces a six-part refinement that collapses region-crossing complexity from exponential to near-linear in the effective gradient dimension $d_{\mathrm{eff}}$, and establishes a Uniform Low-Barrier connectivity that guarantees global optimality across cone-wise minima. The authors prove a first global convergence rate for Adam in non-smooth settings and a Kakeya-based generalization bound that scales as $\tilde{O}(\sqrt{d_{\mathrm{eff}}/n})$, improving over traditional PAC-Bayes bounds and avoiding NTK or convexity assumptions. The results yield practical guidance for tuning Adam in deep ReLU models and extend to Hölder-smooth losses, adversarial perturbations, and polynomial-mixing data streams, underscoring the theoretical underpinnings of Adam’s empirical success in high-dimensional, non-smooth landscapes.

Abstract

First-order adaptive optimization methods like Adam are the default choices for training modern deep neural networks. Despite their empirical success, the theoretical understanding of these methods in non-smooth settings, particularly in Deep ReLU networks, remains limited. ReLU activations create exponentially many region boundaries where standard smoothness assumptions break down. \textbf{We derive the first \(\tilde{O}\!\bigl(\sqrt{d_{\mathrm{eff}}/n}\bigr)\) generalization bound for Adam in Deep ReLU networks and the first global-optimal convergence for Adam in the non smooth, non convex relu landscape without a global PL or convexity assumption.} Our analysis is based on stratified Morse theory and novel results in Kakeya sets. We develop a multi-layer refinement framework that progressively tightens bounds on region crossings. We prove that the number of region crossings collapses from exponential to near-linear in the effective dimension. Using a Kakeya based method, we give a tighter generalization bound than PAC-Bayes approaches and showcase convergence using a mild uniform low barrier assumption.

Convergence of Adam in Deep ReLU Networks via Directional Complexity and Kakeya Bounds

TL;DR

This work tackles the long-standing challenge of proving global convergence for Adam in non-smooth ReLU networks by developing a geometric framework based on stratified Morse theory, Whitney stratification, and Kakeya geometry. It introduces a six-part refinement that collapses region-crossing complexity from exponential to near-linear in the effective gradient dimension , and establishes a Uniform Low-Barrier connectivity that guarantees global optimality across cone-wise minima. The authors prove a first global convergence rate for Adam in non-smooth settings and a Kakeya-based generalization bound that scales as , improving over traditional PAC-Bayes bounds and avoiding NTK or convexity assumptions. The results yield practical guidance for tuning Adam in deep ReLU models and extend to Hölder-smooth losses, adversarial perturbations, and polynomial-mixing data streams, underscoring the theoretical underpinnings of Adam’s empirical success in high-dimensional, non-smooth landscapes.

Abstract

First-order adaptive optimization methods like Adam are the default choices for training modern deep neural networks. Despite their empirical success, the theoretical understanding of these methods in non-smooth settings, particularly in Deep ReLU networks, remains limited. ReLU activations create exponentially many region boundaries where standard smoothness assumptions break down. \textbf{We derive the first \(\tilde{O}\!\bigl(\sqrt{d_{\mathrm{eff}}/n}\bigr)\) generalization bound for Adam in Deep ReLU networks and the first global-optimal convergence for Adam in the non smooth, non convex relu landscape without a global PL or convexity assumption.} Our analysis is based on stratified Morse theory and novel results in Kakeya sets. We develop a multi-layer refinement framework that progressively tightens bounds on region crossings. We prove that the number of region crossings collapses from exponential to near-linear in the effective dimension. Using a Kakeya based method, we give a tighter generalization bound than PAC-Bayes approaches and showcase convergence using a mild uniform low barrier assumption.

Paper Structure

This paper contains 127 sections, 47 theorems, 204 equations, 1 table.

Key Result

Theorem 1

For any $\Pi$ hyperplanes in $\mathbb{R}^D$,

Theorems & Definitions (89)

  • Theorem 1: Stratified Morse Region Count (baseline)
  • Theorem 2: Phase I: Finite Region Bound
  • Theorem 3: Gradient rate under finite crossings
  • Theorem 4: Generalization via Kakeya
  • Theorem 5: Global Convergence Rate
  • Proposition 1: Polynomial Structure
  • Theorem 6: Affine Approximation Error Bound
  • proof
  • Lemma 1: Local mask freezing
  • proof
  • ...and 79 more