Addressing the Plasticity-Stability Dilemma in Reinforcement Learning
Mansi Maheshwari, John C. Raisbeck, Bruno Castro da Silva
TL;DR
The paper tackles plasticity loss in reinforcement learning by proposing AltNet, a dual-network reset-based framework where two networks alternately interact with the environment and learn off-policy from a shared replay buffer. By resetting one network at fixed intervals and swapping roles, AltNet restores plasticity while preserving stability, avoiding post-reset performance drops that afflict prior reset methods. Empirical results on the DeepMind Control Suite show AltNet improves sample efficiency and final performance across off-policy and on-policy settings, with ablations demonstrating the critical roles of replay-buffer preservation and alternating resets. The approach offers a practical path toward stable, continual learning in RL and suggests avenues for adaptive scheduling and broader domain validation.
Abstract
Neural networks have shown remarkable success in supervised learning when trained on a single task using a fixed dataset. However, when neural networks are trained on a reinforcement learning task, their ability to continue learning from new experiences declines over time. This decline in learning ability is known as plasticity loss. To restore plasticity, prior work has explored periodically resetting the parameters of the learning network, a strategy that often improves overall performance. However, such resets come at the cost of a temporary drop in performance, which can be dangerous in real-world settings. To overcome this instability, we introduce AltNet, a reset-based approach that restores plasticity without performance degradation by leveraging twin networks. The use of twin networks anchors performance during resets through a mechanism that allows networks to periodically alternate roles: one network learns as it acts in the environment, while the other learns off-policy from the active network's interactions and a replay buffer. At fixed intervals, the active network is reset and the passive network, having learned from prior experiences, becomes the new active network. AltNet restores plasticity, improving sample efficiency and achieving higher performance, while avoiding performance drops that pose risks in safety-critical settings. We demonstrate these advantages in several high-dimensional control tasks from the DeepMind Control Suite, where AltNet outperforms various relevant baseline methods, as well as state-of-the-art reset-based techniques.
