A Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation
Yang Peng, Kaicheng Jin, Liangyu Zhang, Zhihua Zhang
TL;DR
The paper addresses distributional policy evaluation with linear function approximation and proves finite-sample, non-asymptotic rates for a linear-categorical TD algorithm (Linear-CTD). By formulating a linear-categorical projected Bellman equation and applying exponential stability to products of random matrices, it shows that learning the full return distribution is statistically as tractable as learning its mean under linear approximations. Theoretical results include instance-dependent and -independent step-size bounds, high-probability guarantees for Markovian data, and a mean-preserving property of Linear-CTD, with a preconditioning technique that removes $K$-dependence in the sample complexity. Empirical validation corroborates convergence and demonstrates Linear-CTD’s advantages over baseline PMF-based methods, especially as the number of categorical supports $K$ grows. Overall, the work bridges a gap in distributional RL by matching the non-asymptotic efficiency of classic TD learning in the linear-function setting and outlining avenues for further improvement via variance reduction.
Abstract
In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted Markov decision process for a given policy π. Previous works on statistical analysis of distributional TD learning mainly focus on the tabular case. In contrast, we first consider the linear function approximation setting and derive sharp finite-sample rates. Our theoretical results demonstrate that the sample complexity of linear distributional TD learning matches that of classic linear TD learning. This implies that, with linear function approximation, learning the full distribution of the return from streaming data is no more difficult than learning its expectation (value function). To derive tight sample complexity bounds, we conduct a fine-grained analysis of the linear-categorical Bellman equation and employ the exponential stability arguments for products of random matrices. Our results provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.
