Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Yixuan Zhang; Qiaomin Xie

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Yixuan Zhang, Qiaomin Xie

TL;DR

This paper analyzes asynchronous Q‑learning with a fixed stepsize under off‑policy Markovian data, casting the joint process as a time‑homogeneous Markov chain. It proves distributional convergence in Wasserstein distance with exponential rate, establishes a central limit theorem for averaged iterates, and derives an explicit α‑bias expansion that enables Richardson‑Romberg extrapolation to reduce bias. The work also develops a local linearization approach to handle the nonsmooth operator and provides comprehensive numerical evidence of bias reduction via RR extrapolation. Overall, the results offer a refined finite‑step understanding of constant‑stepsize Q‑learning, with implications for uncertainty quantification and bias‑reduction in practice.

Abstract

Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this paper, we study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence. By connecting the constant stepsize Q-learning to a time-homogeneous Markov chain, we show the distributional convergence of the iterates in Wasserstein distance and establish its exponential convergence rate. We also establish a Central Limit Theory for Q-learning iterates, demonstrating the asymptotic normality of the averaged iterates. Moreover, we provide an explicit expansion of the asymptotic bias of the averaged iterate in stepsize. Specifically, the bias is proportional to the stepsize up to higher-order terms and we provide an explicit expression for the linear coefficient. This precise characterization of the bias allows the application of Richardson-Romberg (RR) extrapolation technique to construct a new estimate that is provably closer to the optimal Q function. Numerical results corroborate our theoretical finding on the improvement of the RR extrapolation method.

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

TL;DR

Abstract

Paper Structure (42 sections, 17 theorems, 148 equations, 2 figures)

This paper contains 42 sections, 17 theorems, 148 equations, 2 figures.

Introduction
Related Work
Q-learning.
Stochastic approximation.
Preliminaries
Main Results
Stationary Distribution and Convergence Rate
Central Limit Theorem
Bias Expansion
Tail Average and Richardson-Romberg extrapolation
Polyak-Ruppert Averaging
Richardson-Romberg Extrapolation
Proof Outline
Proof Outline for Theorem \ref{['limit4tabular']} on Convergence
Proof Outline for Theorem \ref{['thm:bias']} on Bias Expansion
...and 27 more sections

Key Result

Theorem 1

Suppose that Assumption MC holds, and the stepsize $\alpha$ for Q-learning eq:tabular satisfies

Figures (2)

Figure 1: The errors of tail-averaged (TA) iterates and RR extrapolated iterates with different stepsizes.
Figure 2: The Q-learning with linear function approximation errors of tail-averaged (TA) iterates and RR extrapolated iterates with different stepsizes.

Theorems & Definitions (18)

Definition 1
Theorem 1
Corollary 1
Theorem 2
Theorem 3
Corollary 2
Corollary 3
Proposition 1
Proposition 2
Proposition 3
...and 8 more

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

TL;DR

Abstract

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (18)