Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages
Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, Dacheng Tao
TL;DR
The paper systematically analyzes plasticity loss in visual reinforcement learning along three axes: data augmentation, network modules, and training stages. It finds that data augmentation is crucial for preserving plasticity, the critic's plasticity loss is the principal bottleneck for sample efficiency, and early-stage plasticity must be recovered promptly to avoid irrecoverable degradation. Based on these insights, it introduces Adaptive RR, which dynamically adjusts replay ratio according to the critic's plasticity level to reap the benefits of higher data reuse without incurring catastrophic plasticity loss. Empirical results on DeepMind Control Suite tasks and Atari-100K demonstrate that Adaptive RR outperforms static replay ratios and traditional interventions, offering a practical strategy to improve sample efficiency in VRL. These findings have implications for VRL algorithm design, suggesting that stage-aware plasticity maintenance should accompany aggressive data reuse to achieve robust, sample-efficient learning.
Abstract
Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic's plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.
