Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning
Sergey Samsonov, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, Alexey Naumov
TL;DR
This work develops non-asymptotic statistical inference for Polyak-Ruppert averaged linear stochastic approximation (LSA) under i.i.d. noise. It proves a Berry-Esseen bound for the multivariate normal approximation of $\sqrt{n}(\bar{\theta}_{n}-\theta^*)$, with the optimal rate $n^{-1/4}$ (up to log factors) attained by the aggressive step size $\alpha_k \sim k^{-1/2}$, and provides a non-asymptotic multiplier bootstrap that yields valid confidence sets without requiring knowledge of the asymptotic covariance. The results are specialized to temporal-difference learning with linear function approximation, including explicit stability conditions and constants. A numerical study on TD learning in Garnet environments demonstrates the predicted rates and the practical viability of bootstrap-based confidence intervals in online settings. Overall, the paper furnishes a framework for finite-sample normal approximation and bootstrap-based inference for online linear stochastic approximation, with clear implications for RL value-function estimation.
Abstract
In this paper, we obtain the Berry-Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Moreover, we prove the non-asymptotic validity of the confidence intervals for parameter estimation with LSA based on multiplier bootstrap. This procedure updates the LSA estimate together with a set of randomly perturbed LSA estimates upon the arrival of subsequent observations. We illustrate our findings in the setting of temporal difference learning with linear function approximation.
