Implicit regularization of normalized gradient descent
Cédric Josz
TL;DR
This work addresses finding flat minima for noncoercive, symmetric objectives by employing normalized gradient descent (NGD) with slowly decaying steps, formalized as $x_{k+1}=x_k - \alpha_k \widehat{\nabla} f(x_k)$. It introduces the normalized subdifferential $\widehat{\nabla} f$, a $d$-Lyapunov framework for the Euler discretization of the gradient flow, and shows how an implicit regularizer $g$ can bias NGD toward flat minima of $f$ when $f+g$ is coercive. Leveraging variational analysis and stratification theory, the authors derive necessary and sufficient conditions for $g$ to serve as an implicit regularizer, relate stability to flatness, and present multiple examples demonstrating convergence to flat minima. The results clarify how discretization, symmetry, and conservation interact in nonsmooth dynamics, offering a principled approach to implicit regularization in semi-algebraic settings and guiding design of step schedules and regularizers for stable convergence to flat minima.
Abstract
How to find flat minima? We propose running normalized gradient descent, usually reserved for nonsmooth optimization, with sufficiently slowly diminishing step sizes. This induces implicit regularization towards flat minima if an appropriate Lyapunov functions exists in the gradient dynamics. Our analysis shows that implicit regularization is intrinsically a question of nonsmooth analysis, for which we deploy the full power of variational analysis and stratification theory.
