Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

Siddharth Chandak; Anuj Yadav; Ayfer Ozgur; Nicholas Bambos

Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

Siddharth Chandak, Anuj Yadav, Ayfer Ozgur, Nicholas Bambos

Abstract

Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.

Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

Abstract

Paper Structure (31 sections, 9 theorems, 102 equations, 3 figures)

This paper contains 31 sections, 9 theorems, 102 equations, 3 figures.

Introduction
Our Contributions
Related Work
Outline and Notation
Problem Formulation
Noise Models and Results
Martingale Difference with Bounded Second Moment
Heavy-tailed Noise
Long-Range Dependent Noise
Proof Outlines
Why Does Traditional Analysis Fail?
Proof Technique
Heavy-Tailed Noise
Long-Range Dependent Noise
Applications
...and 16 more sections

Key Result

Theorem 1

Suppose Assumptions assu:monotone and assu:standard are satisfied. Then there exist constants $C_1, C_2$, and $C_3$ such that if $\beta>C_1$, $K_0\geq C_2$, then for all $k\geq 0$,

Figures (3)

Figure 1: Performance of SGD under heavy-tailed noise ($\alpha-$Pareto distribution)
Figure 2: Performance of SGD under LRD noise (fractional Gaussian noise (fGn))
Figure 3: Performance of gradient play under heavy-tailed and LRD noise models.

Theorems & Definitions (11)

Theorem 1
Theorem 2
Theorem 3
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5
proof
Lemma 6
...and 1 more

Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

Abstract

Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (11)