Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction

Jie Feng; Ke Wei; Jinchi Chen

Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction

Jie Feng, Ke Wei, Jinchi Chen

TL;DR

This paper introduces NPG-HM, a single-loop natural policy gradient method augmented with Hessian-aided momentum variance reduction to improve sample efficiency in RL. By avoiding importance sampling and solving the update subproblem with SGD, the authors prove global last-iterate convergence with a sample complexity of $O(\varepsilon^{-2})$ under Fisher-non-degenerate policies, grounded in a relaxed gradient-dominance framework and a novel error decomposition. Theoretical results are complemented by Mujoco-based experiments showing that NPG-HM outperforms several state-of-the-art policy-gradient methods. These contributions provide a practical, theoretically sound approach for efficient continuous-control RL with strong global convergence guarantees.

Abstract

Natural policy gradient (NPG) and its variants are widely-used policy search methods in reinforcement learning. Inspired by prior work, a new NPG variant coined NPG-HM is developed in this paper, which utilizes the Hessian-aided momentum technique for variance reduction, while the sub-problem is solved via the stochastic gradient descent method. It is shown that NPG-HM can achieve the global last iterate $ε$-optimality with a sample complexity of $\mathcal{O}(ε^{-2})$, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.

Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction

TL;DR

under Fisher-non-degenerate policies, grounded in a relaxed gradient-dominance framework and a novel error decomposition. Theoretical results are complemented by Mujoco-based experiments showing that NPG-HM outperforms several state-of-the-art policy-gradient methods. These contributions provide a practical, theoretically sound approach for efficient continuous-control RL with strong global convergence guarantees.

Abstract

-optimality with a sample complexity of

, which is the best known result for natural policy gradient type methods under the generic Fisher non-degenerate policy parameterizations. The convergence analysis is built upon a relaxed weak gradient dominance property tailored for NPG under the compatible function approximation framework, as well as a neat way to decompose the error when handling the sub-problem. Moreover, numerical experiments on Mujoco-based environments demonstrate the superior performance of NPG-HM over other state-of-the-art policy gradient methods.

Paper Structure (28 sections, 11 theorems, 83 equations, 1 figure, 2 tables, 2 algorithms)

This paper contains 28 sections, 11 theorems, 83 equations, 1 figure, 2 tables, 2 algorithms.

Introduction
Related works
Convergence of exact policy gradient methods.
Sample complexity for first-order stationary point convergence.
Sample complexity for global convergence.
Main contributions
Paper organization
Preliminaries
Policy gradient
Natural policy gradient
Hessian-aided momentum variance reduction
NPG-HM and Its Global Convergence
NPG-HM
Global convergence of NPG-HM
Proof of Main Theorem
...and 13 more sections

Key Result

Theorem 3.1

Suppose $H = -\frac{1}{\log \gamma} \log(T+\tau_0), \beta_t = \frac{\tau_0}{t+\tau_0}, \alpha_t = \alpha_0 \beta_t^{1/2}, \lambda_t = \lambda_0 \beta_t^{-1/2}, \lambda_0 = \frac{\kappa \tau_0 \alpha_0}{4\mu_F}$ and $\alpha_0= \sqrt{\frac{\mu_F^2}{\kappa\tau_0(12L^2 + 6\nu_h^2)}}$, where $t\geq 1$ a

Figures (1)

Figure 1: Empirical comparison of NPG-HM and other policy gradient methods on six environments.

Theorems & Definitions (20)

Definition 3.1
Remark 3.1
Remark 3.2
Theorem 3.1
Remark 3.3
Lemma 4.1
Remark 4.1
Lemma 4.2
Remark 4.2
Lemma 4.3
...and 10 more

Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction

TL;DR

Abstract

Global Convergence of Natural Policy Gradient with Hessian-aided Momentum Variance Reduction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (20)