A Tighter Convergence Proof of Reverse Experience Replay
Nan Jiang, Jinzhao Li, Yexiang Xue
TL;DR
This work tightens the finite-sample analysis of Reverse Experience Replay (RER) for Q-learning in linear MDPs by replacing a prohibitive combinatorial bound with a combinatorial counting approach. It removes the strict requirement that the product $\eta L$ be at most $1/3$, showing that convergence can be achieved with larger learning rates and longer reverse sequences. A bias-variance decomposition is used to derive a probabilistic bound on the Q-function error, and the analysis yields a concrete sample complexity expression that includes contraction factors depending on $\eta$, $L$, $N$, the mixing constant $\kappa$, and the feature bound $C_\Phi$. The results bridge theory and practice by aligning the theoretical guarantees with empirical gains observed for RER, and the combinatorial counting technique may inspire similar analyses in related domains.
Abstract
In reinforcement learning, Reverse Experience Replay (RER) is a recently proposed algorithm that attains better sample complexity than the classic experience replay method. RER requires the learning algorithm to update the parameters through consecutive state-action-reward tuples in reverse order. However, the most recent theoretical analysis only holds for a minimal learning rate and short consecutive steps, which converge slower than those large learning rate algorithms without RER. In view of this theoretical and empirical gap, we provide a tighter analysis that mitigates the limitation on the learning rate and the length of consecutive steps. Furthermore, we show theoretically that RER converges with a larger learning rate and a longer sequence.
