Table of Contents
Fetching ...

Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$

Bowei Zhu, Shaojie Li, Mingyang Yi, Yong Liu

TL;DR

This paper addresses the problem of achieving high-probability, dimension-free excess risk bounds for learning with loss functions that are Lipschitz, smooth, and satisfy the Polyak-Lojasiewicz condition or strong convexity. By introducing a notion of uniform stability in gradients, the authors connect gradient generalization to excess risk and derive a sharp bound of order $\tilde{O}\left(\log^2(n)/n^2\right)$ for ERM, PGD, and SGD. The results improve over the prior $\tilde{O}(1/n)$-type bounds by leveraging gradient stability and the PL/smooth structure to control optimization and generalization error jointly. The findings are dimension-free and applicable to common optimization algorithms, enabling tighter risk guarantees in practice and addressing open questions about improving stability-based bounds in nonconvex settings. Overall, the work provides a rigorous route to substantially sharper generalization performance via gradient-level stability under standard regularity assumptions.

Abstract

Prior work (Klochkov $\&$ Zhivotovskiy, 2021) establishes at most $O\left(\log (n)/n\right)$ excess risk bounds via algorithmic stability for strongly-convex learners with high probability. We show that under the similar common assumptions -- - Polyak-Lojasiewicz condition, smoothness, and Lipschitz continous for losses -- - rates of $O\left(\log^2(n)/n^2\right)$ are at most achievable. To our knowledge, our analysis also provides the tightest high-probability bounds for gradient-based generalization gaps in nonconvex settings.

Stability and Sharper Risk Bounds with Convergence Rate $\tilde{O}(1/n^2)$

TL;DR

This paper addresses the problem of achieving high-probability, dimension-free excess risk bounds for learning with loss functions that are Lipschitz, smooth, and satisfy the Polyak-Lojasiewicz condition or strong convexity. By introducing a notion of uniform stability in gradients, the authors connect gradient generalization to excess risk and derive a sharp bound of order for ERM, PGD, and SGD. The results improve over the prior -type bounds by leveraging gradient stability and the PL/smooth structure to control optimization and generalization error jointly. The findings are dimension-free and applicable to common optimization algorithms, enabling tighter risk guarantees in practice and addressing open questions about improving stability-based bounds in nonconvex settings. Overall, the work provides a rigorous route to substantially sharper generalization performance via gradient-level stability under standard regularity assumptions.

Abstract

Prior work (Klochkov Zhivotovskiy, 2021) establishes at most excess risk bounds via algorithmic stability for strongly-convex learners with high probability. We show that under the similar common assumptions -- - Polyak-Lojasiewicz condition, smoothness, and Lipschitz continous for losses -- - rates of are at most achievable. To our knowledge, our analysis also provides the tightest high-probability bounds for gradient-based generalization gaps in nonconvex settings.

Paper Structure

This paper contains 20 sections, 21 theorems, 139 equations, 1 table.

Key Result

Theorem 1

Assume for any $z$, $f(\cdot, z)$ is $M$-Lipschitz. If $A$ is $\beta$-uniformly-stable in gradients, then for any $\delta \in (0,1)$, the following inequality holds with probability at least $1-\delta$

Theorems & Definitions (51)

  • Definition 1
  • Definition 2: Uniform Stability in Gradients
  • Remark 1
  • Theorem 1: Generalization via Stability in Gradients fan2024high
  • Remark 2
  • Theorem 2: Sharper Generalization via Stability in Gradients
  • Remark 3
  • Definition 3: Strong Growth Condition
  • Proposition 1: SGC case
  • Remark 4
  • ...and 41 more