Table of Contents
Fetching ...

On the geometry of flat minima

Cédric Josz

TL;DR

The paper proposes a geometry-driven definition of flat minima by measuring maximal local variation around a point, using $\accentset{\circ}{f}$ and its dual $\overline{f}$ to compare points via a robust preorder. It develops a comprehensive calculus framework across nonsmooth and definable settings, and then ties flatness to dynamical structure through conserved quantities and linear symmetries, showing that gradient flows associated with these invariants flatten the objective landscape over time. A key contribution is the matrix factorization analysis, where balanced minima ($X^T X=Y^T Y$) are shown to be flat, with a strong local-global property under Frobenius norms and explicit global minimizer forms, linking flatness to spectral properties of the Hessian. The results are illustrated with diverse examples, clarifying when flatness coincides with or diverges from traditional notions of sharpness, and highlighting the role of symmetry and conservation in shaping landscape geometry and convergence behavior.

Abstract

What does it mean to be flat? We propose to define it by measuring the maximal variation around a point, or from a dual perspective, the distance to neighboring level sets. After developing some calculus rules, we show how flat minima, conservation laws, and symmetries are intertwined. Gradient flows of conserved quantities are of particular interest, due to their flattening properties.

On the geometry of flat minima

TL;DR

The paper proposes a geometry-driven definition of flat minima by measuring maximal local variation around a point, using and its dual to compare points via a robust preorder. It develops a comprehensive calculus framework across nonsmooth and definable settings, and then ties flatness to dynamical structure through conserved quantities and linear symmetries, showing that gradient flows associated with these invariants flatten the objective landscape over time. A key contribution is the matrix factorization analysis, where balanced minima () are shown to be flat, with a strong local-global property under Frobenius norms and explicit global minimizer forms, linking flatness to spectral properties of the Hessian. The results are illustrated with diverse examples, clarifying when flatness coincides with or diverges from traditional notions of sharpness, and highlighting the role of symmetry and conservation in shaping landscape geometry and convergence behavior.

Abstract

What does it mean to be flat? We propose to define it by measuring the maximal variation around a point, or from a dual perspective, the distance to neighboring level sets. After developing some calculus rules, we show how flat minima, conservation laws, and symmetries are intertwined. Gradient flows of conserved quantities are of particular interest, due to their flattening properties.

Paper Structure

This paper contains 16 sections, 30 theorems, 165 equations, 1 figure.

Key Result

Proposition 3.11

Under assum:isolated,

Figures (1)

  • Figure 1: 1000 iterations of gradient descent applied to $f(x_1,x_2)=x_2^2+x_1^2x_2^4$ with step length $(k+1)^{-1/6}$ initialized at $(3.2,0.6)$.

Theorems & Definitions (102)

  • proof
  • Definition 2.2
  • Definition 3.1
  • Definition 3.2
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 92 more