Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability

Eric Gan

Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability

Eric Gan

Abstract

Empirically, modern deep learning training often occurs at the Edge of Stability (EoS), where the sharpness of the loss exceeds the threshold below which classical convergence analysis applies. Despite recent progress, existing theoretical explanations of EoS either rely on restrictive assumptions or focus on specific squared-loss-type objectives. In this work, we introduce and study a structural property of loss functions that we term product-stability. We show that for losses with product-stable minima, gradient descent applied to objectives of the form $(x,y) \mapsto l(xy)$ can provably converge to the local minimum even when training in the EoS regime. This framework substantially generalizes prior results and applies to a broad class of losses, including binary cross entropy. Using bifurcation diagrams, we characterize the resulting training dynamics, explain the emergence of stable oscillations, and precisely quantify the sharpness at convergence. Together, our results offer a principled explanation for stable EoS training for a wider class of loss functions.

Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability

Abstract

can provably converge to the local minimum even when training in the EoS regime. This framework substantially generalizes prior results and applies to a broad class of losses, including binary cross entropy. Using bifurcation diagrams, we characterize the resulting training dynamics, explain the emergence of stable oscillations, and precisely quantify the sharpness at convergence. Together, our results offer a principled explanation for stable EoS training for a wider class of loss functions.

Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability

Abstract

Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (44)