Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

Peiyuan Zhang; Amin Karbasi

Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

Peiyuan Zhang, Amin Karbasi

TL;DR

This work shows that Edge of Stability (EoS) can occur in gradient descent with a large step-size even when the loss is quadratic, in a regression task with quadratic parameterization β_w = w_+^2 - w_-^2. It combines empirical evidence and a rigorous one-sample analysis (d=2) to prove that GD converges to a linear interpolator β_∞ within the EoS regime, with distinct behavior depending on ημ < 1 or ημ > 1 and with bounds on generalization error relative to sparsity priors. The study connects EoS phenomena to implicit bias in depth-2 diagonal linear networks and extends insights to multi-sample, overparameterized settings, showing overparameterization (d ≥ n) is necessary for EoS in the quadratic-loss diagonal-linear-net setting. Overall, the paper broadens the understanding of EoS by showing that subquadratic loss is not a strict prerequisite and by detailing the phase-transition dynamics and convergence properties under large GD step-sizes. The results have implications for the design and analysis of optimization in overparameterized linear-model regimes and for interpreting implicit bias in practical neural-network-like architectures.

Abstract

Classical optimization theory requires a small step-size for gradient-based methods to converge. Nevertheless, recent findings challenge the traditional idea by empirically demonstrating Gradient Descent (GD) converges even when the step-size $η$ exceeds the threshold of $2/L$, where $L$ is the global smooth constant. This is usually known as the Edge of Stability (EoS) phenomenon. A widely held belief suggests that an objective function with subquadratic growth plays an important role in incurring EoS. In this paper, we provide a more comprehensive answer by considering the task of finding linear interpolator $β\in R^{d}$ for regression with loss function $l(\cdot)$, where $β$ admits parameterization as $β= w^2_{+} - w^2_{-}$. Contrary to the previous work that suggests a subquadratic $l$ is necessary for EoS, our novel finding reveals that EoS occurs even when $l$ is quadratic under proper conditions. This argument is made rigorous by both empirical and theoretical evidence, demonstrating the GD trajectory converges to a linear interpolator in a non-asymptotic way. Moreover, the model under quadratic $l$, also known as a depth-$2$ diagonal linear network, remains largely unexplored under the EoS regime. Our analysis then sheds some new light on the implicit bias of diagonal linear networks when a larger step-size is employed, enriching the understanding of EoS on more practical models.

Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

TL;DR

Abstract

exceeds the threshold of

, where

is the global smooth constant. This is usually known as the Edge of Stability (EoS) phenomenon. A widely held belief suggests that an objective function with subquadratic growth plays an important role in incurring EoS. In this paper, we provide a more comprehensive answer by considering the task of finding linear interpolator

for regression with loss function

, where

admits parameterization as

. Contrary to the previous work that suggests a subquadratic

is necessary for EoS, our novel finding reveals that EoS occurs even when

is quadratic under proper conditions. This argument is made rigorous by both empirical and theoretical evidence, demonstrating the GD trajectory converges to a linear interpolator in a non-asymptotic way. Moreover, the model under quadratic

, also known as a depth-

diagonal linear network, remains largely unexplored under the EoS regime. Our analysis then sheds some new light on the implicit bias of diagonal linear networks when a larger step-size is employed, enriching the understanding of EoS on more practical models.

Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

TL;DR

Abstract

Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (42)