Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning
Jiahua Lan, Sen Zhang, Haixia Pan, Ruijun Liu, Li Shen, Dacheng Tao
TL;DR
The paper tackles the stability-plasticity dilemma in deep reinforcement learning by introducing NBSP, a neuron-level approach that identifies RL skill neurons through a goal-oriented activation score and preserves their prior activations with gradient masking while allowing non-skill neurons to adapt. It combines this with experience replay to reinforce past knowledge, evaluated under cycling task protocols with SAC on Meta-World and Atari benchmarks. Results show NBSP achieves superior stability-plasticity balance, evidenced by higher ASR/AR, lower Forgetting Measure, and strong Forward Transfer, along with comprehensive ablations establishing the contribution of gradient masking, replay, and neuron identification. The work advances continual DRL by revealing the importance of neuron-level control and offers directions for model distillation and extensions to other learning paradigms.
Abstract
In contrast to the human ability to continuously acquire knowledge, agents struggle with the stability-plasticity dilemma in deep reinforcement learning (DRL), which refers to the trade-off between retaining existing skills (stability) and learning new knowledge (plasticity). Current methods focus on balancing these two aspects at the network level, lacking sufficient differentiation and fine-grained control of individual neurons. To overcome this limitation, we propose Neuron-level Balance between Stability and Plasticity (NBSP) method, by taking inspiration from the observation that specific neurons are strongly relevant to task-relevant skills. Specifically, NBSP first (1) defines and identifies RL skill neurons that are crucial for knowledge retention through a goal-oriented method, and then (2) introduces a framework by employing gradient masking and experience replay techniques targeting these neurons to preserve the encoded existing skills while enabling adaptation to new tasks. Numerous experimental results on the Meta-World and Atari benchmarks demonstrate that NBSP significantly outperforms existing approaches in balancing stability and plasticity.
