Table of Contents
Fetching ...

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

Haoran Li, Zhennan Jiang, Yuhui Chen, Dongbin Zhao

TL;DR

This paper investigates the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and finds that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space.

Abstract

With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL. More visualization results are available at https://jzndd.github.io/CP3ER-Page/.

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

TL;DR

This paper investigates the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and finds that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space.

Abstract

With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL. More visualization results are available at https://jzndd.github.io/CP3ER-Page/.
Paper Structure (29 sections, 9 equations, 19 figures, 3 tables, 3 algorithms)

This paper contains 29 sections, 9 equations, 19 figures, 3 tables, 3 algorithms.

Figures (19)

  • Figure 1: The dormant ratios of the policy under the online and offline training.
  • Figure 2: The dormant ratios of the policy networks with different losses and observations.
  • Figure 3: (a) The framework of CP3ER, where PPE is the abbreviation of prioritized proximal experience. (b) The sampling weights $\beta$ with different $\alpha$.
  • Figure 4: Results on medium-level tasks in DeepMind control suite with 5 random seeds.
  • Figure 5: Results on hard-level tasks in DeepMind control suite with 5 random seeds.
  • ...and 14 more figures