Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise

George Yin; Vikram Krishnamurthy

Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise

George Yin, Vikram Krishnamurthy

TL;DR

This work provides a finite-sample analysis of a projected stochastic gradient algorithm with correlated noise, proving that the mean-square error decays as $\mathbb{E}\|\theta_n-\theta^*\|^2 = O(1/n)$ and the regret grows at most like $\mathbb{E}\{\text{Regret}_n\} \le K L \log n$. The authors tackle correlated disturbances via a perturbed Lyapunov function $W = V + V_1$, which cancels problematic noise terms and yields a $O(1/n)$ drift, with an additional discussion of the i.i.d. case. They also analyze escape times from a neighborhood of the optimum using large-deviations theory, showing exponentially small escape probabilities and exponentially long expected residence times in the neighborhood. The results rely on convex, smooth objective structure, mixing assumptions on the noise, and a local-quadratic approximation around the minimizer, making the findings relevant for finite-sample guarantees in stochastic optimization under dependent noise.

Abstract

We analyze the finite sample regret of a decreasing step size stochastic gradient algorithm. We assume correlated noise and use a perturbed Lyapunov function as a systematic approach for the analysis. Finally we analyze the escape time of the iterates using large deviations theory.

Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise

TL;DR

This work provides a finite-sample analysis of a projected stochastic gradient algorithm with correlated noise, proving that the mean-square error decays as

and the regret grows at most like

. The authors tackle correlated disturbances via a perturbed Lyapunov function

, which cancels problematic noise terms and yields a

drift, with an additional discussion of the i.i.d. case. They also analyze escape times from a neighborhood of the optimum using large-deviations theory, showing exponentially small escape probabilities and exponentially long expected residence times in the neighborhood. The results rely on convex, smooth objective structure, mixing assumptions on the noise, and a local-quadratic approximation around the minimizer, making the findings relevant for finite-sample guarantees in stochastic optimization under dependent noise.

Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise

TL;DR

Abstract

Finite Sample and Large Deviations Analysis of Stochastic Gradient Algorithm with Correlated Noise

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (11)