Table of Contents
Fetching ...

Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning

Jiahua Lan, Sen Zhang, Haixia Pan, Ruijun Liu, Li Shen, Dacheng Tao

TL;DR

The paper tackles the stability-plasticity dilemma in deep reinforcement learning by introducing NBSP, a neuron-level approach that identifies RL skill neurons through a goal-oriented activation score and preserves their prior activations with gradient masking while allowing non-skill neurons to adapt. It combines this with experience replay to reinforce past knowledge, evaluated under cycling task protocols with SAC on Meta-World and Atari benchmarks. Results show NBSP achieves superior stability-plasticity balance, evidenced by higher ASR/AR, lower Forgetting Measure, and strong Forward Transfer, along with comprehensive ablations establishing the contribution of gradient masking, replay, and neuron identification. The work advances continual DRL by revealing the importance of neuron-level control and offers directions for model distillation and extensions to other learning paradigms.

Abstract

In contrast to the human ability to continuously acquire knowledge, agents struggle with the stability-plasticity dilemma in deep reinforcement learning (DRL), which refers to the trade-off between retaining existing skills (stability) and learning new knowledge (plasticity). Current methods focus on balancing these two aspects at the network level, lacking sufficient differentiation and fine-grained control of individual neurons. To overcome this limitation, we propose Neuron-level Balance between Stability and Plasticity (NBSP) method, by taking inspiration from the observation that specific neurons are strongly relevant to task-relevant skills. Specifically, NBSP first (1) defines and identifies RL skill neurons that are crucial for knowledge retention through a goal-oriented method, and then (2) introduces a framework by employing gradient masking and experience replay techniques targeting these neurons to preserve the encoded existing skills while enabling adaptation to new tasks. Numerous experimental results on the Meta-World and Atari benchmarks demonstrate that NBSP significantly outperforms existing approaches in balancing stability and plasticity.

Neuron-level Balance between Stability and Plasticity in Deep Reinforcement Learning

TL;DR

The paper tackles the stability-plasticity dilemma in deep reinforcement learning by introducing NBSP, a neuron-level approach that identifies RL skill neurons through a goal-oriented activation score and preserves their prior activations with gradient masking while allowing non-skill neurons to adapt. It combines this with experience replay to reinforce past knowledge, evaluated under cycling task protocols with SAC on Meta-World and Atari benchmarks. Results show NBSP achieves superior stability-plasticity balance, evidenced by higher ASR/AR, lower Forgetting Measure, and strong Forward Transfer, along with comprehensive ablations establishing the contribution of gradient masking, replay, and neuron identification. The work advances continual DRL by revealing the importance of neuron-level control and offers directions for model distillation and extensions to other learning paradigms.

Abstract

In contrast to the human ability to continuously acquire knowledge, agents struggle with the stability-plasticity dilemma in deep reinforcement learning (DRL), which refers to the trade-off between retaining existing skills (stability) and learning new knowledge (plasticity). Current methods focus on balancing these two aspects at the network level, lacking sufficient differentiation and fine-grained control of individual neurons. To overcome this limitation, we propose Neuron-level Balance between Stability and Plasticity (NBSP) method, by taking inspiration from the observation that specific neurons are strongly relevant to task-relevant skills. Specifically, NBSP first (1) defines and identifies RL skill neurons that are crucial for knowledge retention through a goal-oriented method, and then (2) introduces a framework by employing gradient masking and experience replay techniques targeting these neurons to preserve the encoded existing skills while enabling adaptation to new tasks. Numerous experimental results on the Meta-World and Atari benchmarks demonstrate that NBSP significantly outperforms existing approaches in balancing stability and plasticity.

Paper Structure

This paper contains 25 sections, 15 equations, 10 figures, 9 tables, 2 algorithms.

Figures (10)

  • Figure 1: Distribution histogram of the activation of a neuron, categorized based on whether the drawer-open task was successfully completed or not.
  • Figure 2: Framework of NBSP. The agent scores and identifies RL skill neurons for each task. While learning new tasks, the gradient of these neurons is masked based on their scores to preserve the encoded skills, while still allowing fine-tuning for new task learning. Additionally, a replay buffer is used to store a portion of the experiences from previous tasks, which is periodically sampled to update the agent, ensuring that knowledge from earlier tasks is retained.
  • Figure 3: Training process of NBSP on the Meta-World benchmark. The segments to the left and right of the dashed line represent the training processes of the first and second cycles, respectively.
  • Figure 4: Performance of NBSP with different proportions of RL skill neurons.
  • Figure 5: Tasks in the Meta-World benchmark used in our experiments.
  • ...and 5 more figures