Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$

Bowei Zhu; Shaojie Li; Mingyang Yi; Yong Liu

Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$

Bowei Zhu, Shaojie Li, Mingyang Yi, Yong Liu

TL;DR

This paper addresses the problem of achieving high-probability, dimension-free excess risk bounds for learning with loss functions that are Lipschitz, smooth, and satisfy the Polyak-Lojasiewicz condition or strong convexity. By introducing a notion of uniform stability in gradients, the authors connect gradient generalization to excess risk and derive a sharp bound of order $\tilde{O}\left(\log^2(n)/n^2\right)$ for ERM, PGD, and SGD. The results improve over the prior $\tilde{O}(1/n)$-type bounds by leveraging gradient stability and the PL/smooth structure to control optimization and generalization error jointly. The findings are dimension-free and applicable to common optimization algorithms, enabling tighter risk guarantees in practice and addressing open questions about improving stability-based bounds in nonconvex settings. Overall, the work provides a rigorous route to substantially sharper generalization performance via gradient-level stability under standard regularity assumptions.

Abstract

Prior work (Klochkov $\&$ Zhivotovskiy, 2021) establishes at most $O\left(\log (n)/n\right)$ excess risk bounds via algorithmic stability for strongly-convex learners with high probability. We show that under the similar common assumptions -- - Polyak-Lojasiewicz condition, smoothness, and Lipschitz continous for losses -- - rates of $O\left(\log^2(n)/n^2\right)$ are at most achievable. To our knowledge, our analysis also provides the tightest high-probability bounds for gradient-based generalization gaps in nonconvex settings.

Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$

TL;DR

for ERM, PGD, and SGD. The results improve over the prior

-type bounds by leveraging gradient stability and the PL/smooth structure to control optimization and generalization error jointly. The findings are dimension-free and applicable to common optimization algorithms, enabling tighter risk guarantees in practice and addressing open questions about improving stability-based bounds in nonconvex settings. Overall, the work provides a rigorous route to substantially sharper generalization performance via gradient-level stability under standard regularity assumptions.

Abstract

Prior work (Klochkov

Zhivotovskiy, 2021) establishes at most

excess risk bounds via algorithmic stability for strongly-convex learners with high probability. We show that under the similar common assumptions -- - Polyak-Lojasiewicz condition, smoothness, and Lipschitz continous for losses -- - rates of

are at most achievable. To our knowledge, our analysis also provides the tightest high-probability bounds for gradient-based generalization gaps in nonconvex settings.

Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$

TL;DR

Abstract

Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (51)