Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise

Sebastian Allmeier; Nicolas Gast

Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise

Sebastian Allmeier, Nicolas Gast

TL;DR

A method based on infinitesimal generator comparisons to study the bias of the algorithm, which is the expected difference between $\theta_n$ -- the value at iteration $n$ -- and $\theta^*$ -- the unique equilibrium of the corresponding ODE, is developed.

Abstract

We study stochastic approximation algorithms with Markovian noise and constant step-size $α$. We develop a method based on infinitesimal generator comparisons to study the bias of the algorithm, which is the expected difference between $θ_n$ -- the value at iteration $n$ -- and $θ^*$ -- the unique equilibrium of the corresponding ODE. We show that, under some smoothness conditions, this bias is of order $O(α)$. Furthermore, we show that the time-averaged bias is equal to $αV + O(α^2)$, where $V$ is a constant characterized by a Lyapunov equation, showing that $\mathbb{E}[\barθ_n] \approx θ^*+Vα+ O(α^2)$, where $\barθ_n=(1/n)\sum_{k=1}^nθ_k$ is the Polyak-Ruppert average. We also show that $\barθ_n$ converges with high probability around $θ^*+αV$. We illustrate how to combine this with Richardson-Romberg extrapolation to derive an iterative scheme with a bias of order $O(α^2)$.

Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise

TL;DR

A method based on infinitesimal generator comparisons to study the bias of the algorithm, which is the expected difference between

-- the value at iteration

-- and

-- the unique equilibrium of the corresponding ODE, is developed.

Abstract

We study stochastic approximation algorithms with Markovian noise and constant step-size

. We develop a method based on infinitesimal generator comparisons to study the bias of the algorithm, which is the expected difference between

-- the value at iteration

-- and

-- the unique equilibrium of the corresponding ODE. We show that, under some smoothness conditions, this bias is of order

. Furthermore, we show that the time-averaged bias is equal to

, where

is a constant characterized by a Lyapunov equation, showing that

, where

is the Polyak-Ruppert average. We also show that

converges with high probability around

. We illustrate how to combine this with Richardson-Romberg extrapolation to derive an iterative scheme with a bias of order

Paper Structure (30 sections, 11 theorems, 76 equations, 6 figures)

This paper contains 30 sections, 11 theorems, 76 equations, 6 figures.

Introduction
Related work
Model and preliminaries
Model and first assumptions
Averaged values, average ODE and stability assumption
Discussion on the assumptions and limits
Notations
Main results and illustrations
Theoretical results
The value of extrapolation: Illustration of Theorem \ref{['thrm:limit_N_refinement']} and \ref{['thrm:limit_N_refinement_proba']}
Proof overview and generator method
Proof overview
Deterministic recurrence and comparison of generators
Derivation of Vαn and of $V$ by comparing the generators
Conclusion / discussion
...and 15 more sections

Key Result

theorem 1

Assume A:mart--A:attractor. Then, there exists a constant $C>0$ and $\alpha_0$ such that for all $n$, $\alpha\le\alpha_0$ and all $h\in\mathcal{C}^{3}_{}(\Theta,\mathbb{R})$:

Figures (6)

Figure 1: Comparison of $\theta_n$, $\bar{\theta}_n$ and $\bar{\theta}_{n/2:n}$ for various $\alpha$.
Figure 2: Illustration of the error of $\bar{\theta}_n := \frac{1}{N} \sum_{k=1}^{n} \theta_{k}$ for various values of $\alpha=0.02\times2^{-k}$ with $k\in\{0\dots5\}$ and of the error of the extrapolation \ref{['eq:extrapolation']} for $\alpha=0.01$ and $\alpha=0.005$.
Figure 3: Overview of the proof. The dashed rectangles indicate the lemmas that are proven in Appendix \ref{['sec:proof_lemma']}.
Figure 4: Illustration of the behavior of $\theta_n$, $\varphi_{n}(\theta_0)$ and $\varphi_{n-k}(\theta_k)$.
Figure 5: Behavior of $\theta_n$, $\bar{\theta}_n$ and $\bar{\theta}_{n/2:n}$ for various values of $\alpha$. All $y$-axis have the same scale.
...and 1 more figures

Theorems & Definitions (19)

theorem 1
theorem 2
theorem 3
proposition 1
proof
proposition 2
proof : Proof of Proposition \ref{['prop:V']}
lemma 1
proof
lemma 2
...and 9 more

Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise

TL;DR

Abstract

Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (19)