Table of Contents
Fetching ...

Time-uniform concentration bounds for iterative algorithms

Tuan Pham, Alessandro Rinaldo, Purnamrita Sarkar

TL;DR

This work develops a time-uniform concentration framework for stochastic iterative algorithms governed by recursive inequalities of the Robbins–Siegmund type, without relying on exponential martingale constructions. The authors prove a quantitative Robbins–Siegmund lemma that yields all-time bounds decaying as $O(\log\log t/t)$ under $\eta_t = \Omega(1/t)$ and suitable control of the noise term $U_t$, with proven optimality via a matching lower bound. They apply the framework to SGD under strong convexity and PL, three variants of Oja’s streaming PCA, and Robbins–Monro schemes, also providing an explicit step-size construction in a simplified setting. The results deliver time-uniform guarantees for last-iterate and whole-trajectory performance, enabling anytime-valid inference and stopping decisions in sequential learning tasks. The methods extend to bounded noise with possible supplement extensions for sub-Gaussian settings and discuss future directions including streaming $k$-PCA and stochastic heavy-ball methods, highlighting practical impact on streaming and online learning regimes.

Abstract

We develop a new framework for deriving time-uniform concentration bounds for the output of stochastic sequential algorithms satisfying certain recursive inequalities akin to those defining the almost-supermartingale processes introduced by \cite{robbins1971convergence}. Our approach is of wide applicability, and can be deployed in settings in which exponential supermartingale processes, required by prevailing methodologies for anytime-valid concentration inequalities, are not readily available. Our results can be viewed as quantitative versions of the classical Robbins-Siegmund Lemma. We demonstrate the effectiveness of our method by providing new and optimal time-uniform concentration bounds for Oja's algorithm for streaming PCA, stochastic gradient descent, and stochastic approximations.

Time-uniform concentration bounds for iterative algorithms

TL;DR

This work develops a time-uniform concentration framework for stochastic iterative algorithms governed by recursive inequalities of the Robbins–Siegmund type, without relying on exponential martingale constructions. The authors prove a quantitative Robbins–Siegmund lemma that yields all-time bounds decaying as under and suitable control of the noise term , with proven optimality via a matching lower bound. They apply the framework to SGD under strong convexity and PL, three variants of Oja’s streaming PCA, and Robbins–Monro schemes, also providing an explicit step-size construction in a simplified setting. The results deliver time-uniform guarantees for last-iterate and whole-trajectory performance, enabling anytime-valid inference and stopping decisions in sequential learning tasks. The methods extend to bounded noise with possible supplement extensions for sub-Gaussian settings and discuss future directions including streaming -PCA and stochastic heavy-ball methods, highlighting practical impact on streaming and online learning regimes.

Abstract

We develop a new framework for deriving time-uniform concentration bounds for the output of stochastic sequential algorithms satisfying certain recursive inequalities akin to those defining the almost-supermartingale processes introduced by \cite{robbins1971convergence}. Our approach is of wide applicability, and can be deployed in settings in which exponential supermartingale processes, required by prevailing methodologies for anytime-valid concentration inequalities, are not readily available. Our results can be viewed as quantitative versions of the classical Robbins-Siegmund Lemma. We demonstrate the effectiveness of our method by providing new and optimal time-uniform concentration bounds for Oja's algorithm for streaming PCA, stochastic gradient descent, and stochastic approximations.

Paper Structure

This paper contains 28 sections, 23 theorems, 330 equations, 1 figure.

Key Result

Lemma 1

Let $\left\{ L_t \right\}_{t\geq 1}$ be an adapted, non-negative process with respect to the filtration $\left\{ \mathcal{F}_t ; t \geq 1 \right\}$. Suppose there exists a positive, deterministic sequence $\left\{ \eta_t; t\geq 1 \right\}$ and a non-negative, adapted sequence $\beta_t$ such that where $\left\{ \eta_t, \beta_t \right\}_{t\geq 1}$ satisfies Then, $\lim_t L_t = 0$, almost surely.

Figures (1)

  • Figure 1: Errors induced by adaptive step sizes

Theorems & Definitions (24)

  • Lemma 1: Simplified Robbins-Siegmund's lemma
  • Theorem 1
  • Proposition 1: maximal inequality
  • Theorem 2
  • Proposition 2
  • Corollary 1
  • Corollary 2
  • Proposition 3
  • Corollary 3
  • Proposition 4
  • ...and 14 more