Table of Contents
Fetching ...

Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

Siddharth Chandak, Anuj Yadav, Ayfer Ozgur, Nicholas Bambos

Abstract

Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.

Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

Abstract

Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.
Paper Structure (31 sections, 9 theorems, 102 equations, 3 figures)

This paper contains 31 sections, 9 theorems, 102 equations, 3 figures.

Key Result

Theorem 1

Suppose Assumptions assu:monotone and assu:standard are satisfied. Then there exist constants $C_1, C_2$, and $C_3$ such that if $\beta>C_1$, $K_0\geq C_2$, then for all $k\geq 0$,

Figures (3)

  • Figure 1: Performance of SGD under heavy-tailed noise ($\alpha-$Pareto distribution)
  • Figure 2: Performance of SGD under LRD noise (fractional Gaussian noise (fGn))
  • Figure 3: Performance of gradient play under heavy-tailed and LRD noise models.

Theorems & Definitions (11)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • proof
  • Lemma 6
  • ...and 1 more