Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

Haoran Li; Zhennan Jiang; Yuhui Chen; Dongbin Zhao

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

Haoran Li, Zhennan Jiang, Yuhui Chen, Dongbin Zhao

TL;DR

This paper investigates the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and finds that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space.

Abstract

With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL. More visualization results are available at https://jzndd.github.io/CP3ER-Page/.

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

TL;DR

Abstract

Paper Structure (29 sections, 9 equations, 19 figures, 3 tables, 3 algorithms)

This paper contains 29 sections, 9 equations, 19 figures, 3 tables, 3 algorithms.

Introduction
Related Work
Diffusion Model in Reinforcement Learning
Visual Reinforcement Learning
Preliminary
Reinforcement Learning
Consistency Policy
Dormant Ratio of Neural Networks
Is Consistency-AC Applicable to Visual RL?
Consistency Policy with Prioritized Proximal Experience Regularization
Experiments
Visual Continuous Control Tasks
Does CP3ER have performance advantages compared to current SOTA methods?
Ablation Study
Can policy regularization improve the behavior of the policy during training?
...and 14 more sections

Figures (19)

Figure 1: The dormant ratios of the policy under the online and offline training.
Figure 2: The dormant ratios of the policy networks with different losses and observations.
Figure 3: (a) The framework of CP3ER, where PPE is the abbreviation of prioritized proximal experience. (b) The sampling weights $\beta$ with different $\alpha$.
Figure 4: Results on medium-level tasks in DeepMind control suite with 5 random seeds.
Figure 5: Results on hard-level tasks in DeepMind control suite with 5 random seeds.
...and 14 more figures

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

TL;DR

Abstract

Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (19)