Table of Contents
Fetching ...

Central Limit Theorems for Asynchronous Averaged Q-Learning

Xingtu Liu

TL;DR

The paper addresses distributional properties of asynchronous Q-learning with Polyak–Ruppert averaging under Markovian noise and decaying stepsizes. It develops a non-asymptotic central limit theorem in 1-Wasserstein distance and a functional central limit theorem for the averaged and partial-sum iterates, respectively, leveraging a Poisson-equation framework and martingale decompositions. The results provide explicit rate bounds that depend on the state-action space size $|\mathcal{S}||\mathcal{A}|$, exploration quality $\rho$, discount factor $\gamma$, and step-size parameter $\beta$, and extend to convergence to Brownian motion for the partial sums. These findings enable uncertainty quantification and statistical inference for asynchronous Q-learning in finite-sample regimes and offer a foundation for further refinements and extensions to other metrics.

Abstract

This paper establishes central limit theorems for Polyak-Ruppert averaged Q-learning under asynchronous updates. We prove a non-asymptotic central limit theorem, where the convergence rate in Wasserstein distance explicitly reflects the dependence on the number of iterations, state-action space size, the discount factor, and the quality of exploration. In addition, we derive a functional central limit theorem, showing that the partial-sum process converges weakly to a Brownian motion.

Central Limit Theorems for Asynchronous Averaged Q-Learning

TL;DR

The paper addresses distributional properties of asynchronous Q-learning with Polyak–Ruppert averaging under Markovian noise and decaying stepsizes. It develops a non-asymptotic central limit theorem in 1-Wasserstein distance and a functional central limit theorem for the averaged and partial-sum iterates, respectively, leveraging a Poisson-equation framework and martingale decompositions. The results provide explicit rate bounds that depend on the state-action space size , exploration quality , discount factor , and step-size parameter , and extend to convergence to Brownian motion for the partial sums. These findings enable uncertainty quantification and statistical inference for asynchronous Q-learning in finite-sample regimes and offer a foundation for further refinements and extensions to other metrics.

Abstract

This paper establishes central limit theorems for Polyak-Ruppert averaged Q-learning under asynchronous updates. We prove a non-asymptotic central limit theorem, where the convergence rate in Wasserstein distance explicitly reflects the dependence on the number of iterations, state-action space size, the discount factor, and the quality of exploration. In addition, we derive a functional central limit theorem, showing that the partial-sum process converges weakly to a Brownian motion.

Paper Structure

This paper contains 12 sections, 14 theorems, 78 equations.

Key Result

lemma 1

Suppose that Assumption asm_2 holds, we have

Theorems & Definitions (21)

  • lemma 1: Proposition 3.1 in chen2021lyapunov
  • theorem 3
  • corollary 1
  • theorem 4
  • lemma 2
  • proof
  • proof
  • theorem 5: Restatement of Theorem 1 in srikant2024rates
  • lemma 3
  • proof
  • ...and 11 more