Table of Contents
Fetching ...

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis, Sean Meyn

TL;DR

The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.

Abstract

The paper concerns the $d$-dimensional stochastic approximation recursion, $$ θ_{n+1}= θ_n + α_{n + 1} f(θ_n, Φ_{n+1}) $$ where $ \{ Φ_n \}$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3): (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$. (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\textsf{E}[ z_n z_n^T ]$ to the asymptotic covariance in the CLT, where $z_n =: (θ_n-θ^*)/\sqrt{α_n}$. (iii) The CLT holds for the normalized version $z^{\text{PR}}_n =: \sqrt{n} [θ^{\text{PR}}_n -θ^*]$, of the averaged parameters $θ^{\text{PR}}_n =:n^{-1} \sum_{k=1}^nθ_k$, subject to standard assumptions on the step-size. Moreover, the covariance in the CLT coincides with the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $θ$, and $Φ$ is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of $θ_n$ is unbounded and in fact diverges. This arXiv version represents a major extension of the results in prior versions.The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

TL;DR

The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.

Abstract

The paper concerns the -dimensional stochastic approximation recursion, where is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise. The main results are established under additional conditions on the mean flow and a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3): (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in . (ii) A functional central limit theorem (CLT) is established, as well as the usual one-dimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance to the asymptotic covariance in the CLT, where . (iii) The CLT holds for the normalized version , of the averaged parameters , subject to standard assumptions on the step-size. Moreover, the covariance in the CLT coincides with the minimal covariance of Polyak and Ruppert. (iv) An example is given where and are linear in , and is a geometrically ergodic Markov chain but does not satisfy (DV3). While the algorithm is convergent, the second moment of is unbounded and in fact diverges. This arXiv version represents a major extension of the results in prior versions.The main results now allow for parameter-dependent noise, as is often the case in applications to reinforcement learning.

Paper Structure

This paper contains 9 sections, 7 theorems, 42 equations.

Key Result

Proposition 1

[proposition]t:v-uni-w

Theorems & Definitions (9)

  • Proposition 1
  • proof
  • Lemma 1
  • proof
  • Proposition 2
  • Theorem 1
  • Theorem 2
  • Lemma 2
  • Proposition 3