Table of Contents
Fetching ...

A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

Sajad Khodadadian, Martin Zubeldia

TL;DR

The paper addresses the lack of non-asymptotic, high-probability bounds for Polyak–Ruppert averaged stochastic approximation in general settings. It develops a general framework that converts high-probability bounds on unaveraged SA iterates into sharp finite-time bounds for the averaged sequence, yielding a leading 1/(k+1) convergence rate and explicit tail terms. A tightness result demonstrates that the leading term is optimal up to a universal constant, and the framework is applied to contractive SA and several reinforcement learning algorithms, including averaged TD-learning, Q-learning, and off-policy TD-learning, deriving new high-probability bounds in settings where prior analyses were limited. The results guide step-size selection and provide practically meaningful concentration guarantees, enhancing reliability for SA-based optimization and RL methods. Overall, the framework offers a modular, broadly applicable approach to quantify the distributional behavior of averaged SA iterates in finite time.

Abstract

Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing non-asymptotic concentration bounds for the error of averaged SA iterates. Our approach assumes access to individual concentration bounds for the unaveraged iterates and yields a sharp bound on the averaged iterates. We also construct an example, showing the tightness of our result up to constant multiplicative factors. As direct applications, we derive tight concentration bounds for contractive SA algorithms and for algorithms such as temporal difference learning and Q-learning with averaging, obtaining new bounds in settings where traditional analysis is challenging.

A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

TL;DR

The paper addresses the lack of non-asymptotic, high-probability bounds for Polyak–Ruppert averaged stochastic approximation in general settings. It develops a general framework that converts high-probability bounds on unaveraged SA iterates into sharp finite-time bounds for the averaged sequence, yielding a leading 1/(k+1) convergence rate and explicit tail terms. A tightness result demonstrates that the leading term is optimal up to a universal constant, and the framework is applied to contractive SA and several reinforcement learning algorithms, including averaged TD-learning, Q-learning, and off-policy TD-learning, deriving new high-probability bounds in settings where prior analyses were limited. The results guide step-size selection and provide practically meaningful concentration guarantees, enhancing reliability for SA-based optimization and RL methods. Overall, the framework offers a modular, broadly applicable approach to quantify the distributional behavior of averaged SA iterates in finite time.

Abstract

Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing non-asymptotic concentration bounds for the error of averaged SA iterates. Our approach assumes access to individual concentration bounds for the unaveraged iterates and yields a sharp bound on the averaged iterates. We also construct an example, showing the tightness of our result up to constant multiplicative factors. As direct applications, we derive tight concentration bounds for contractive SA algorithms and for algorithms such as temporal difference learning and Q-learning with averaging, obtaining new bounds in settings where traditional analysis is challenging.

Paper Structure

This paper contains 32 sections, 16 theorems, 140 equations.

Key Result

Theorem 4.1

Fix $k\geq 1$ and $\xi<1$. Suppose that, for any $\delta'\in (0,1)$, with probability at least $1-\delta'$, we have $\|x_i-x^*\|^2_c \leq \alpha_i f_\xi(\delta',k)$ for all $0\leq i\leq k$, for some function $f_\xi$. Then, under assumptions ass:norm_smoothness, ass:operators, and ass:sub-Gaussian, f where with Given two norms $\|\cdot\|_a$ and $\|\cdot\|_b$, we define the constants $\ell_{ab}$ an

Theorems & Definitions (26)

  • Remark
  • Remark
  • Theorem 4.1
  • Corollary 4.1
  • Proposition 4.1
  • Lemma 4.1
  • Corollary 4.2
  • Proposition 4.2
  • Remark
  • Theorem 5.1
  • ...and 16 more