Table of Contents
Fetching ...

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, Dacheng Tao

TL;DR

The paper systematically analyzes plasticity loss in visual reinforcement learning along three axes: data augmentation, network modules, and training stages. It finds that data augmentation is crucial for preserving plasticity, the critic's plasticity loss is the principal bottleneck for sample efficiency, and early-stage plasticity must be recovered promptly to avoid irrecoverable degradation. Based on these insights, it introduces Adaptive RR, which dynamically adjusts replay ratio according to the critic's plasticity level to reap the benefits of higher data reuse without incurring catastrophic plasticity loss. Empirical results on DeepMind Control Suite tasks and Atari-100K demonstrate that Adaptive RR outperforms static replay ratios and traditional interventions, offering a practical strategy to improve sample efficiency in VRL. These findings have implications for VRL algorithm design, suggesting that stage-aware plasticity maintenance should accompany aggressive data reuse to achieve robust, sample-efficient learning.

Abstract

Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic's plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

TL;DR

The paper systematically analyzes plasticity loss in visual reinforcement learning along three axes: data augmentation, network modules, and training stages. It finds that data augmentation is crucial for preserving plasticity, the critic's plasticity loss is the principal bottleneck for sample efficiency, and early-stage plasticity must be recovered promptly to avoid irrecoverable degradation. Based on these insights, it introduces Adaptive RR, which dynamically adjusts replay ratio according to the critic's plasticity level to reap the benefits of higher data reuse without incurring catastrophic plasticity loss. Empirical results on DeepMind Control Suite tasks and Atari-100K demonstrate that Adaptive RR outperforms static replay ratios and traditional interventions, offering a practical strategy to improve sample efficiency in VRL. These findings have implications for VRL algorithm design, suggesting that stage-aware plasticity maintenance should accompany aggressive data reuse to achieve robust, sample-efficient learning.

Abstract

Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic's plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.
Paper Structure (20 sections, 1 equation, 22 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 1 equation, 22 figures, 5 tables, 1 algorithm.

Figures (22)

  • Figure 1: Training curves across four combinations: incorporating or excluding Reset and DA. We adopt DrQ-v2 DrQ-v2 as our baseline algorithm and follow the Reset settings from primacy_bias. Mean and std are estimated over 5 runs. Note that re-initializing 10 times in the Quadruped Run task resulted in poor performance, prompting us to adjust the reset times to 5. For ablation studies on reset times and results in other tasks, please refer to Appendix\ref{['Appendix: Reset']}.
  • Figure 2: Performance of various interventions in Cheetah Run across 5 seeds.
  • Figure 2: Summary of Atari-100K results. Comprehensive scores are available in Appendix \ref{['Evaluation on Atari']}.
  • Figure 3: Different FAU trends across modules throughout training. The plasticity of encoder and actor displays similar trends whether DA is employed or not. Conversely, integrating DA leads to a marked improvement in the critic's plasticity. Further comparative results are in Appendix\ref{['Appendix: FAU trends']}.
  • Figure 4: Learning curves of DrQ-v2 using a frozen ImageNet pre-trained encoder, with and without DA.
  • ...and 17 more figures