Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation

Yanjie Dong; Haijun Zhang; Gang Wang; Shisheng Cui; Xiping Hu

Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation

Yanjie Dong, Haijun Zhang, Gang Wang, Shisheng Cui, Xiping Hu

TL;DR

The paper addresses variance and slow convergence in policy-gradient methods for continuous-state RL by introducing a heavy-ball momentum term into the critic update of an actor-critic framework (HB-A2C). It uses multi-step bootstrapping and a two-timescale online scheme to update actor and critic concurrently, with a linear function approximator for the value function. A new analytical framework bounds gradient bias and optimality drift under Markovian noise, showing that the unified actor-critic recursions converge to an $\epsilon$-stationary point at a rate of ${\cal O}(1/\sqrt{K})$ (with additional ${\cal O}(1/K)$ terms) when the learning rates scale as $\alpha=\Theta(1/\sqrt{K})$ and $\beta=c_5\alpha$. The results reveal how the momentum factor and trajectory length influence convergence, and they provide finite-time guarantees without requiring decaying variance. This work thus offers a principled, momentum-accelerated approach for RL with function approximation in online, Markovian settings, potentially improving data efficiency and convergence speed in continuous control tasks.

Abstract

By using an parametric value function to replace the Monte-Carlo rollouts for value estimation, the actor-critic (AC) algorithms can reduce the variance of stochastic policy gradient so that to improve the convergence rate. While existing works mainly focus on analyzing convergence rate of AC algorithms under Markovian noise, the impacts of momentum on AC algorithms remain largely unexplored. In this work, we first propose a heavy-ball momentum based advantage actor-critic (\mbox{HB-A2C}) algorithm by integrating the heavy-ball momentum into the critic recursion that is parameterized by a linear function. When the sample trajectory follows a Markov decision process, we quantitatively certify the acceleration capability of the proposed HB-A2C algorithm. Our theoretical results demonstrate that the proposed HB-A2C finds an $ε$-approximate stationary point with $\oo{ε^{-2}}$ iterations for reinforcement learning tasks with Markovian noise. Moreover, we also reveal the dependence of learning rates on the length of the sample trajectory. By carefully selecting the momentum factor of the critic recursion, the proposed HB-A2C can balance the errors introduced by the initialization and the stoschastic approximation.

Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation

TL;DR

-stationary point at a rate of

(with additional

terms) when the learning rates scale as

and

. The results reveal how the momentum factor and trajectory length influence convergence, and they provide finite-time guarantees without requiring decaying variance. This work thus offers a principled, momentum-accelerated approach for RL with function approximation in online, Markovian settings, potentially improving data efficiency and convergence speed in continuous control tasks.

Abstract

-approximate stationary point with

iterations for reinforcement learning tasks with Markovian noise. Moreover, we also reveal the dependence of learning rates on the length of the sample trajectory. By carefully selecting the momentum factor of the critic recursion, the proposed HB-A2C can balance the errors introduced by the initialization and the stoschastic approximation.

Paper Structure (17 sections, 9 theorems, 87 equations, 1 algorithm)

This paper contains 17 sections, 9 theorems, 87 equations, 1 algorithm.

Introduction
Preliminaries
Problem description
Function approximation
Heavy-Ball Based Actor-Critic for RL Tasks
Algorithm development
Convergence analysis
Proof of Lemma \ref{['lemmas:01']}
Proof of Lemma \ref{['lemmas:02']}
Proof of Lemma \ref{['lemmas:05']}
Proof of Lemma \ref{['lemmas:03']}
Proof of Theorem \ref{['thm:02']}
Proof of Theorem \ref{['thm:01']}
Analysis of the Gradient Variance
Analysis of the Optimality Drift
...and 2 more sections

Key Result

Lemma 1

The (stochastic) semi-gradient of critic and (stochastic) policy gradient of actor are bounded as where $R_g = (1+\gamma^T) R_w + c_1(\gamma)R_r$ and $R_h = R_\pi[R_r \!+\! (1+\gamma)R_w]$ with $c_1(\gamma) = (1 - \gamma^T)/(1-\gamma)$.

Theorems & Definitions (11)

Remark 1
Remark 2
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Theorem 1
Theorem 2
Corollary 1
Lemma 5
...and 1 more

Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation

TL;DR

Abstract

Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (11)