A Tighter Convergence Proof of Reverse Experience Replay

Nan Jiang; Jinzhao Li; Yexiang Xue

A Tighter Convergence Proof of Reverse Experience Replay

Nan Jiang, Jinzhao Li, Yexiang Xue

TL;DR

This work tightens the finite-sample analysis of Reverse Experience Replay (RER) for Q-learning in linear MDPs by replacing a prohibitive combinatorial bound with a combinatorial counting approach. It removes the strict requirement that the product $\eta L$ be at most $1/3$, showing that convergence can be achieved with larger learning rates and longer reverse sequences. A bias-variance decomposition is used to derive a probabilistic bound on the Q-function error, and the analysis yields a concrete sample complexity expression that includes contraction factors depending on $\eta$, $L$, $N$, the mixing constant $\kappa$, and the feature bound $C_\Phi$. The results bridge theory and practice by aligning the theoretical guarantees with empirical gains observed for RER, and the combinatorial counting technique may inspire similar analyses in related domains.

Abstract

In reinforcement learning, Reverse Experience Replay (RER) is a recently proposed algorithm that attains better sample complexity than the classic experience replay method. RER requires the learning algorithm to update the parameters through consecutive state-action-reward tuples in reverse order. However, the most recent theoretical analysis only holds for a minimal learning rate and short consecutive steps, which converge slower than those large learning rate algorithms without RER. In view of this theoretical and empirical gap, we provide a tighter analysis that mitigates the limitation on the learning rate and the length of consecutive steps. Furthermore, we show theoretically that RER converges with a larger learning rate and a longer sequence.

A Tighter Convergence Proof of Reverse Experience Replay

TL;DR

be at most

, showing that convergence can be achieved with larger learning rates and longer reverse sequences. A bias-variance decomposition is used to derive a probabilistic bound on the Q-function error, and the analysis yields a concrete sample complexity expression that includes contraction factors depending on

, the mixing constant

, and the feature bound

. The results bridge theory and practice by aligning the theoretical guarantees with empirical gains observed for RER, and the combinatorial counting technique may inspire similar analyses in related domains.

Abstract

Paper Structure (27 sections, 24 theorems, 60 equations, 3 figures, 1 algorithm)

This paper contains 27 sections, 24 theorems, 60 equations, 3 figures, 1 algorithm.

Introduction
Preliminaries
Markov Decision Process
Value Function and $Q$-Function
$Q$-learning
$Q$-learning with Function Approximation
Experience Replay
Reverse Experience Replay
Problem Setups for Reverse Experience Replay
Linear MDP Assumption
Methodology
Motivation
Numerical Justification of the Tighter Bound
Relaxing the Requirement $\eta L\le 1/3$ through Combinatorial Counting
Sample Complexity of Reverse Experience Replay-Based $Q$-Learning on Linear MDPs
...and 12 more sections

Key Result

Theorem 1

Let $\mu$ be the stationary distribution of the state-action pair in the MDP. The following matrix inequalities, which are positive semi-definite, hold for $\eta \in (0,1)$: where the matrix $\Gamma_L$ is defined in Definition def:gamma. The relation $\preceq$ between the matrices on both sides is defined in Definition def:psd, referring to the positive semi-definite property.

Figures (3)

Figure 1: For all the different sequence lengths, our derived expression value is numerically higher than the original expression, which implies our bound (in Lemma \ref{['lem:combi-weighted']}) is tighter than the original one in DBLP:conf/iclr/Agarwal2022.
Figure 2: Case 1 in the proposed combinatorial counting procedure. This case illustrates how many terms of the form $\phi_{l_1} \phi^\top_{l_1} \ldots \phi_{l_k} \phi^\top_{l_k}$ can be reduced to $\phi_l \phi_l^\top$ for a fixed $l$ using Lemma \ref{['lem:relax']}, where $1 \leq l \leq L$. If $l_1$ is assigned to the left $l$-th slot, then $l_k$ cannot choose any of the left terms with indices $L, \ldots, l+1$ due to the sequential ordering constraint $l_i$ must be to the right of $l_{i-1}$. To avoid double counting, $l_k$ is also disallowed from occupying the right $l$-th slot. Consequently, there are $L + l - 2$ available slots for assigning the remaining sequence $l_2, \ldots, l_k$ of length $k-1$. Therefore, there are $\binom{L + l - 2}{k - 1}$ such terms for this case. Further cases are illustrated in Figure \ref{['fig:combination']} in the appendix.
Figure 3: Visualization of all cases for the combinatorial counting problem.

Theorems & Definitions (42)

Remark 1
Definition 1
Definition 2
Theorem 1
proof : Proof Sketch
Lemma 1
Lemma 2
proof : Sketch of Proof
Lemma 3
Lemma 4: Bias and variance decomposition
...and 32 more

A Tighter Convergence Proof of Reverse Experience Replay

TL;DR

Abstract

A Tighter Convergence Proof of Reverse Experience Replay

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (42)