Table of Contents
Fetching ...

Implicit regularization of normalized gradient descent

Cédric Josz

TL;DR

This work addresses finding flat minima for noncoercive, symmetric objectives by employing normalized gradient descent (NGD) with slowly decaying steps, formalized as $x_{k+1}=x_k - \alpha_k \widehat{\nabla} f(x_k)$. It introduces the normalized subdifferential $\widehat{\nabla} f$, a $d$-Lyapunov framework for the Euler discretization of the gradient flow, and shows how an implicit regularizer $g$ can bias NGD toward flat minima of $f$ when $f+g$ is coercive. Leveraging variational analysis and stratification theory, the authors derive necessary and sufficient conditions for $g$ to serve as an implicit regularizer, relate stability to flatness, and present multiple examples demonstrating convergence to flat minima. The results clarify how discretization, symmetry, and conservation interact in nonsmooth dynamics, offering a principled approach to implicit regularization in semi-algebraic settings and guiding design of step schedules and regularizers for stable convergence to flat minima.

Abstract

How to find flat minima? We propose running normalized gradient descent, usually reserved for nonsmooth optimization, with sufficiently slowly diminishing step sizes. This induces implicit regularization towards flat minima if an appropriate Lyapunov functions exists in the gradient dynamics. Our analysis shows that implicit regularization is intrinsically a question of nonsmooth analysis, for which we deploy the full power of variational analysis and stratification theory.

Implicit regularization of normalized gradient descent

TL;DR

This work addresses finding flat minima for noncoercive, symmetric objectives by employing normalized gradient descent (NGD) with slowly decaying steps, formalized as . It introduces the normalized subdifferential , a -Lyapunov framework for the Euler discretization of the gradient flow, and shows how an implicit regularizer can bias NGD toward flat minima of when is coercive. Leveraging variational analysis and stratification theory, the authors derive necessary and sufficient conditions for to serve as an implicit regularizer, relate stability to flatness, and present multiple examples demonstrating convergence to flat minima. The results clarify how discretization, symmetry, and conservation interact in nonsmooth dynamics, offering a principled approach to implicit regularization in semi-algebraic settings and guiding design of step schedules and regularizers for stable convergence to flat minima.

Abstract

How to find flat minima? We propose running normalized gradient descent, usually reserved for nonsmooth optimization, with sufficiently slowly diminishing step sizes. This induces implicit regularization towards flat minima if an appropriate Lyapunov functions exists in the gradient dynamics. Our analysis shows that implicit regularization is intrinsically a question of nonsmooth analysis, for which we deploy the full power of variational analysis and stratification theory.
Paper Structure (12 sections, 33 theorems, 118 equations, 6 figures, 1 table)

This paper contains 12 sections, 33 theorems, 118 equations, 6 figures, 1 table.

Key Result

Proposition 2.6

Let $F:\mathbb{R}^ n \rightrightarrows \mathbb{R}^ n$ be locally bounded with a closed graph and $g:\mathbb{R}^ n \to \overline{\mathbb{R}}$ be $C^ {2,2}$ near $\overline{x}\in \mathbb{R}^ n$. If then $g$ is 2-d-Lyapunov near $\overline{x}$.

Figures (6)

  • Figure 1: $f(x,y)=(xy-1)^ 2$, $g(x,y)=(x^ 2-y^ 2)^ 2$, and $\lambda = 0.1$.
  • Figure 2: Normalized gradient descent with step size $1/(k+1)^ {1/4}$.
  • Figure 3: Normalized gradient descent with step size $1/\sqrt{k+1}$.
  • Figure 4: Normalized gradient descent with step size $0.4/(k+1)^ {1/4}$ and 4 initial points.
  • Figure 5: Normalized gradient descent with step sizes $1/(k+1)^ {1/2}$ and $1/(k+1)^ {1/3}$ resp.
  • ...and 1 more figures

Theorems & Definitions (92)

  • Example 1.1
  • proof
  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Proposition 2.6
  • Theorem 2.8
  • Theorem 2.9
  • ...and 82 more