On the geometry of flat minima
Cédric Josz
TL;DR
The paper proposes a geometry-driven definition of flat minima by measuring maximal local variation around a point, using $\accentset{\circ}{f}$ and its dual $\overline{f}$ to compare points via a robust preorder. It develops a comprehensive calculus framework across nonsmooth and definable settings, and then ties flatness to dynamical structure through conserved quantities and linear symmetries, showing that gradient flows associated with these invariants flatten the objective landscape over time. A key contribution is the matrix factorization analysis, where balanced minima ($X^T X=Y^T Y$) are shown to be flat, with a strong local-global property under Frobenius norms and explicit global minimizer forms, linking flatness to spectral properties of the Hessian. The results are illustrated with diverse examples, clarifying when flatness coincides with or diverges from traditional notions of sharpness, and highlighting the role of symmetry and conservation in shaping landscape geometry and convergence behavior.
Abstract
What does it mean to be flat? We propose to define it by measuring the maximal variation around a point, or from a dual perspective, the distance to neighboring level sets. After developing some calculus rules, we show how flat minima, conservation laws, and symmetries are intertwined. Gradient flows of conserved quantities are of particular interest, due to their flattening properties.
