Table of Contents
Fetching ...

Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise

Xiaochi Qian, Zixuan Xie, Xinyu Liu, Shangtong Zhang

TL;DR

This paper establishes the first almost sure convergence rate and the first maximal concentration bound with exponential tails for general contractive stochastic approximation algorithms with Markovian noise for general contractive stochastic approximation algorithms with Markovian noise.

Abstract

This paper establishes the first almost sure convergence rate and the first maximal concentration bound with exponential tails for general contractive stochastic approximation algorithms with Markovian noise. As a corollary, we also obtain convergence rates in $L^p$. Key to our successes is a novel discretization of the mean ODE of stochastic approximation algorithms using intervals with diminishing (instead of constant) length. As applications, we provide the first almost sure convergence rate for $Q$-learning with Markovian samples without count-based learning rates. We also provide the first concentration bound for off-policy temporal difference learning with Markovian samples.

Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise

TL;DR

This paper establishes the first almost sure convergence rate and the first maximal concentration bound with exponential tails for general contractive stochastic approximation algorithms with Markovian noise for general contractive stochastic approximation algorithms with Markovian noise.

Abstract

This paper establishes the first almost sure convergence rate and the first maximal concentration bound with exponential tails for general contractive stochastic approximation algorithms with Markovian noise. As a corollary, we also obtain convergence rates in . Key to our successes is a novel discretization of the mean ODE of stochastic approximation algorithms using intervals with diminishing (instead of constant) length. As applications, we provide the first almost sure convergence rate for -learning with Markovian samples without count-based learning rates. We also provide the first concentration bound for off-policy temporal difference learning with Markovian samples.

Paper Structure

This paper contains 27 sections, 22 theorems, 120 equations, 2 tables.

Key Result

Theorem 1

Let Assumptions assu markov chain - assu Lipschitz and assu lr hold. Let $\qty{w_t}$ be the iterates generated by eq sa update. If $\nu < 1$, then for any $\zeta \in (0, \frac{3}{2} \nu - 1)$, If $\nu = 1$, then for any $\zeta \in (0, 1)$ and $\nu_1 > 0$,

Theorems & Definitions (26)

  • Remark 1: Pseudo-Contraction
  • Remark 2: Linear Stochastic Approximation
  • Theorem 1: Almost Sure Convergence Rates
  • Theorem 2: Concentration
  • Corollary 1: $L^p$ Convergence Rates
  • Theorem 3
  • Theorem 4
  • Remark 3
  • Lemma 1
  • Lemma 2
  • ...and 16 more