Table of Contents
Fetching ...

Central Limit Theorem for Two-Timescale Stochastic Approximation with Markovian Noise: Theory and Applications

Jie Hu, Vishwaraj Doshi, Do Young Eun

TL;DR

This work develops a central limit theorem for two-timescale stochastic approximation with controlled Markovian noise, clarifying how an underlying Markov chain shapes the joint asymptotics of the slow and fast iterates. A key methodological advance is a single-timescale reduction via Poisson decomposition that isolates Markovian bias and yields explicit Lyapunov-form expressions for the limiting covariances. The results show asymptotic independence between the two coordinates and provide closed-form covariance integrals, enabling comparison across sampling schemes through efficiency ordering. The paper also applies the CLT to nonlinear GTD algorithms in reinforcement learning, proving identical asymptotic performance for GTD2 and TDC under Markovian samples, supported by simulations. Collectively, the findings broaden TTSA applicability to distributed optimization and policy evaluation with nonlinear function approximation, and offer a principled lens for designing sampling strategies with favorable asymptotics.

Abstract

Two-timescale stochastic approximation (TTSA) is among the most general frameworks for iterative stochastic algorithms. This includes well-known stochastic optimization methods such as SGD variants and those designed for bilevel or minimax problems, as well as reinforcement learning like the family of gradient-based temporal difference (GTD) algorithms. In this paper, we conduct an in-depth asymptotic analysis of TTSA under controlled Markovian noise via central limit theorem (CLT), uncovering the coupled dynamics of TTSA influenced by the underlying Markov chain, which has not been addressed by previous CLT results of TTSA only with Martingale difference noise. Building upon our CLT, we expand its application horizon of efficient sampling strategies from vanilla SGD to a wider TTSA context in distributed learning, thus broadening the scope of Hu et al. (2022). In addition, we leverage our CLT result to deduce the statistical properties of GTD algorithms with nonlinear function approximation using Markovian samples and show their identical asymptotic performance, a perspective not evident from current finite-time bounds.

Central Limit Theorem for Two-Timescale Stochastic Approximation with Markovian Noise: Theory and Applications

TL;DR

This work develops a central limit theorem for two-timescale stochastic approximation with controlled Markovian noise, clarifying how an underlying Markov chain shapes the joint asymptotics of the slow and fast iterates. A key methodological advance is a single-timescale reduction via Poisson decomposition that isolates Markovian bias and yields explicit Lyapunov-form expressions for the limiting covariances. The results show asymptotic independence between the two coordinates and provide closed-form covariance integrals, enabling comparison across sampling schemes through efficiency ordering. The paper also applies the CLT to nonlinear GTD algorithms in reinforcement learning, proving identical asymptotic performance for GTD2 and TDC under Markovian samples, supported by simulations. Collectively, the findings broaden TTSA applicability to distributed optimization and policy evaluation with nonlinear function approximation, and offer a principled lens for designing sampling strategies with favorable asymptotics.

Abstract

Two-timescale stochastic approximation (TTSA) is among the most general frameworks for iterative stochastic algorithms. This includes well-known stochastic optimization methods such as SGD variants and those designed for bilevel or minimax problems, as well as reinforcement learning like the family of gradient-based temporal difference (GTD) algorithms. In this paper, we conduct an in-depth asymptotic analysis of TTSA under controlled Markovian noise via central limit theorem (CLT), uncovering the coupled dynamics of TTSA influenced by the underlying Markov chain, which has not been addressed by previous CLT results of TTSA only with Martingale difference noise. Building upon our CLT, we expand its application horizon of efficient sampling strategies from vanilla SGD to a wider TTSA context in distributed learning, thus broadening the scope of Hu et al. (2022). In addition, we leverage our CLT result to deduce the statistical properties of GTD algorithms with nonlinear function approximation using Markovian samples and show their identical asymptotic performance, a perspective not evident from current finite-time bounds.
Paper Structure (33 sections, 27 theorems, 193 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 33 sections, 27 theorems, 193 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Lemma 2.1

Under Assumptions assump:one - assump:five, iterates $({\mathbf{x}}_n,{\mathbf{y}}_n)$ in eqn:general_two_timescale_SA almost surely converge to a set of roots, i.e., $({\mathbf{x}}_n, {\mathbf{y}}_n) \to \bigcup_{{\mathbf{x}}^* \in \Lambda} ({\mathbf{x}}^*, \lambda({\mathbf{x}}^*))$ a.s.

Figures (7)

  • Figure 1: Efficiency Ordering: From SGD to TTSA.
  • Figure 2: Comparison of the performance among different sampling strategies in momentum SGD.
  • Figure 3: Comparison of nonlinear GTD2 and TDC algorithms in the $5$-state random walk task.
  • Figure 4: Comparison of the performance ordering in SGDA in terms of iterates ${\mathbf{x}}_n$.
  • Figure 5: Comparison of the performance ordering in SGDA in terms of iterates ${\mathbf{y}}_n$.
  • ...and 2 more figures

Theorems & Definitions (36)

  • Lemma 2.1: Almost Sure Convergence
  • Theorem 2.2: Central Limit Theorem
  • Definition 3.1: Efficiency Ordering, mira2001orderinghu2022efficiency
  • Proposition 3.2
  • Proposition 3.3
  • Theorem A.1.1: yaji2020stochastic Theorem 4
  • Lemma A.2.1
  • Lemma A.2.2
  • Remark A.2.1
  • Remark A.2.2
  • ...and 26 more