Table of Contents
Fetching ...

Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

Xiaokang Pan, Xingyu Li, Jin Liu, Tao Sun, Kai Sun, Lixing Chen, Zhe Qu

TL;DR

The paper addresses generalization in STORM-based stochastic optimization across 1 to K-level problems by introducing a K-level uniform stability framework and linking stability to excess risk. It analyzes STORM, COVER, and SVMR under convex and strongly convex settings, deriving explicit stability and excess-risk bounds that reveal how estimator variance and the number of levels influence generalization. A key finding is that increasing levels escalates generalization error unless mitigated by larger initial estimator batches, while carefully chosen iteration counts yield favorable excess-risk rates (e.g., $O(1/\sqrt{n})$ in convex and $O(1/\sqrt{n})$ in strongly convex cases with appropriately scaled $T$). The paper also provides empirical validation showing stability-generalization trade-offs and practical guidance on batch sizing to improve generalization in multi-level STORM-based methods.

Abstract

STOchastic Recursive Momentum (STORM)-based algorithms have been widely developed to solve one to $K$-level ($K \geq 3$) stochastic optimization problems. Specifically, they use estimators to mitigate the biased gradient issue and achieve near-optimal convergence results. However, there is relatively little work on understanding their generalization performance, particularly evident during the transition from one to $K$-level optimization contexts. This paper provides a comprehensive generalization analysis of three representative STORM-based algorithms: STORM, COVER, and SVMR, for one, two, and $K$-level stochastic optimizations under both convex and strongly convex settings based on algorithmic stability. Firstly, we define stability for $K$-level optimizations and link it to generalization. Then, we detail the stability results for three prominent STORM-based algorithms. Finally, we derive their excess risk bounds by balancing stability results with optimization errors. Our theoretical results provide strong evidence to complete STORM-based algorithms: (1) Each estimator may decrease their stability due to variance with its estimation target. (2) Every additional level might escalate the generalization error, influenced by the stability and the variance between its cumulative stochastic gradient and the true gradient. (3) Increasing the batch size for the initial computation of estimators presents a favorable trade-off, enhancing the generalization performance.

Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

TL;DR

The paper addresses generalization in STORM-based stochastic optimization across 1 to K-level problems by introducing a K-level uniform stability framework and linking stability to excess risk. It analyzes STORM, COVER, and SVMR under convex and strongly convex settings, deriving explicit stability and excess-risk bounds that reveal how estimator variance and the number of levels influence generalization. A key finding is that increasing levels escalates generalization error unless mitigated by larger initial estimator batches, while carefully chosen iteration counts yield favorable excess-risk rates (e.g., in convex and in strongly convex cases with appropriately scaled ). The paper also provides empirical validation showing stability-generalization trade-offs and practical guidance on batch sizing to improve generalization in multi-level STORM-based methods.

Abstract

STOchastic Recursive Momentum (STORM)-based algorithms have been widely developed to solve one to -level () stochastic optimization problems. Specifically, they use estimators to mitigate the biased gradient issue and achieve near-optimal convergence results. However, there is relatively little work on understanding their generalization performance, particularly evident during the transition from one to -level optimization contexts. This paper provides a comprehensive generalization analysis of three representative STORM-based algorithms: STORM, COVER, and SVMR, for one, two, and -level stochastic optimizations under both convex and strongly convex settings based on algorithmic stability. Firstly, we define stability for -level optimizations and link it to generalization. Then, we detail the stability results for three prominent STORM-based algorithms. Finally, we derive their excess risk bounds by balancing stability results with optimization errors. Our theoretical results provide strong evidence to complete STORM-based algorithms: (1) Each estimator may decrease their stability due to variance with its estimation target. (2) Every additional level might escalate the generalization error, influenced by the stability and the variance between its cumulative stochastic gradient and the true gradient. (3) Increasing the batch size for the initial computation of estimators presents a favorable trade-off, enhancing the generalization performance.
Paper Structure (23 sections, 39 theorems, 346 equations, 4 figures, 1 table, 3 algorithms)

This paper contains 23 sections, 39 theorems, 346 equations, 4 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

If Assumption ass:Lipschitz continuous (iii) holds true and the randomized algorithm $A$ is uniformly stable, then for $K\geq 3$, $\mathbb{E}_{S,A}[F(A(S))-F_S(A(S))]$ is bounded by where $\operatorname{Var}_{k}(A(S)) = \mathbb{E}_{v^{(k)}}[\|f_{k} \circ f_{k-1}\circ \cdots \circ f_1(A(S)- f_{k}^{v^{(k)}} \circ f_{k-1}\circ \cdots \circ f_1(A(S)\|^{2}]$.

Figures (4)

  • Figure 1: SGD VS STORM.
  • Figure 2: Effect of Level.
  • Figure 3: Effect of batch size.
  • Figure 4: Effect of noise.

Theorems & Definitions (88)

  • Definition 1: Uniform Stability
  • Theorem 1
  • Remark 1
  • Remark 2
  • Definition 2
  • Theorem 2: One-level, Stability, Convex
  • Remark 3
  • Theorem 3: Two-level, Stability, Convex
  • Remark 4
  • Theorem 4: $K$-level, Stability, Convex
  • ...and 78 more