Table of Contents
Fetching ...

Mastering Continual Reinforcement Learning through Fine-Grained Sparse Network Allocation and Dormant Neuron Exploration

Chengqi Zheng, Haiyan Yin, Jianda Chen, Terence Ng, Yew-Soon Ong, Ivor Tsang

TL;DR

SSDE tackles the plasticity-stability dilemma in continual reinforcement learning by partitioning policy parameters into forward-transfer (frozen) and task-specific (trainable) sub-networks via a coarse-to-fine allocation scheme. It combines a co-allocation mechanism with sparse prompts (global and local) to preemptively assign forward-transfer capacity and preserve sufficient trainable capacity, and employs fine-grained masking to enable targeted updates. To counteract expressivity loss in sparse networks, SSDE introduces sensitivity-guided dormant neuron exploration, periodically resetting dormant neurons to boost exploration and adaptation. On CW10-v1, SSDE achieves state-of-the-art stability (95% success) with competitive plasticity, and across CW10-v2 and CW20-v1/v2, it demonstrates robust, scalable performance with efficient sub-network allocation and interpretable mask structures.

Abstract

Continual Reinforcement Learning (CRL) is essential for developing agents that can learn, adapt, and accumulate knowledge over time. However, a fundamental challenge persists as agents must strike a delicate balance between plasticity, which enables rapid skill acquisition, and stability, which ensures long-term knowledge retention while preventing catastrophic forgetting. In this paper, we introduce SSDE, a novel structure-based approach that enhances plasticity through a fine-grained allocation strategy with Structured Sparsity and Dormant-guided Exploration. SSDE decomposes the parameter space into forward-transfer (frozen) parameters and task-specific (trainable) parameters. Crucially, these parameters are allocated by an efficient co-allocation scheme under sparse coding, ensuring sufficient trainable capacity for new tasks while promoting efficient forward transfer through frozen parameters. However, structure-based methods often suffer from rigidity due to the accumulation of non-trainable parameters, limiting exploration and adaptability. To address this, we further introduce a sensitivity-guided neuron reactivation mechanism that systematically identifies and resets dormant neurons, which exhibit minimal influence in the sparse policy network during inference. This approach effectively enhance exploration while preserving structural efficiency. Extensive experiments on the CW10-v1 Continual World benchmark demonstrate that SSDE achieves state-of-the-art performance, reaching a success rate of 95%, surpassing prior methods significantly in both plasticity and stability trade-offs (code is available at: https://github.com/chengqiArchy/SSDE).

Mastering Continual Reinforcement Learning through Fine-Grained Sparse Network Allocation and Dormant Neuron Exploration

TL;DR

SSDE tackles the plasticity-stability dilemma in continual reinforcement learning by partitioning policy parameters into forward-transfer (frozen) and task-specific (trainable) sub-networks via a coarse-to-fine allocation scheme. It combines a co-allocation mechanism with sparse prompts (global and local) to preemptively assign forward-transfer capacity and preserve sufficient trainable capacity, and employs fine-grained masking to enable targeted updates. To counteract expressivity loss in sparse networks, SSDE introduces sensitivity-guided dormant neuron exploration, periodically resetting dormant neurons to boost exploration and adaptation. On CW10-v1, SSDE achieves state-of-the-art stability (95% success) with competitive plasticity, and across CW10-v2 and CW20-v1/v2, it demonstrates robust, scalable performance with efficient sub-network allocation and interpretable mask structures.

Abstract

Continual Reinforcement Learning (CRL) is essential for developing agents that can learn, adapt, and accumulate knowledge over time. However, a fundamental challenge persists as agents must strike a delicate balance between plasticity, which enables rapid skill acquisition, and stability, which ensures long-term knowledge retention while preventing catastrophic forgetting. In this paper, we introduce SSDE, a novel structure-based approach that enhances plasticity through a fine-grained allocation strategy with Structured Sparsity and Dormant-guided Exploration. SSDE decomposes the parameter space into forward-transfer (frozen) parameters and task-specific (trainable) parameters. Crucially, these parameters are allocated by an efficient co-allocation scheme under sparse coding, ensuring sufficient trainable capacity for new tasks while promoting efficient forward transfer through frozen parameters. However, structure-based methods often suffer from rigidity due to the accumulation of non-trainable parameters, limiting exploration and adaptability. To address this, we further introduce a sensitivity-guided neuron reactivation mechanism that systematically identifies and resets dormant neurons, which exhibit minimal influence in the sparse policy network during inference. This approach effectively enhance exploration while preserving structural efficiency. Extensive experiments on the CW10-v1 Continual World benchmark demonstrate that SSDE achieves state-of-the-art performance, reaching a success rate of 95%, surpassing prior methods significantly in both plasticity and stability trade-offs (code is available at: https://github.com/chengqiArchy/SSDE).

Paper Structure

This paper contains 19 sections, 9 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Plasticity-stability trade-off on CW10-v1 benchmark: Stability refers to how well an agent retains learned knowledge, measured by task performance; plasticity quantifies how quickly an agent adapts to new tasks, measured by the normalized steps required to learn a task (Eq. \ref{['equation:metrics']}). Our proposed method SSDE achieves state-of-the-art stability of 95%, demonstrating strong capability in mitigating forgetting. For plasticity, SSDE remains competitive with strong Behavior Cloning baselines, despite the latter having access to more data during experience replay.
  • Figure 2: Co-Allocation with Sparse Prompting aims to learn two sets of calibration embeddings, $\bm{\alpha_{k[\Gamma]}}$ and $\bm{\alpha_{k[\Lambda]}}$, which generate neuron-level binary calibration masks $\bm{\phi_{k[\Gamma]}^{(l)}}$ and $\bm{\phi_{k[\Lambda]}^{(l)}}$. Both masks are further merged together to form $\bm{\phi_k^{(l)}}$, which is multiplied to the output of the $l$-th layer to calibrate the output. This layer-wise binary mask intuitively specifies a layer-wise sparse sub-net structure, promoting enhanced plasticity. Upper: A global-level sparse coding process learns $\bm{\alpha_{k[\Gamma]}}$ by projecting different task embeddings onto a shared plane of ${\bm{D}}^{(l)}$, assigning similar masks to similar tasks. Lower: A local task-specific prompting process leverages random projection planes to learn $\bm{\alpha_{k[\Lambda]}}$, increasing the capacity for trainable parameters.
  • Figure 3: Structural Exploration with Dormant Neurons in SSDE: (i) Structural sparsity is achieved by generating a sub-network from neurons co-allocated by two sparse prompting processes ($\Gamma$ and $\Lambda$). (ii) Fine-grained inference is performed on it, with the trade-off coefficient $\boldsymbol{\beta}$ controlling the balance of trainable and frozen parameters. (iii) For structural exploration, the input of the sparse network is perturbed to maximize the sensitivity of active neurons. Neurons colored blue (acquired for sub-network policy during co-allocation) are evaluated based on sensitivity score $c_i^{(l)}$, while inactive neurons, identified as dormant (marked 'D') are reset to enhance sub-network expressiveness.
  • Figure 4: Evaluation on SSDE's co-allocation vs. CoTASP's sparse prompting on CW10-v1.
  • Figure 5: Visualization of task description similarity (a) and sub-network mask similarity across layers (b-f) for SSDE. The strong alignment between the task description heatmap and sub-network allocation masks across layers indicates that SSDE effectively captures task similarities encoded in the descriptions.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Definition 4.1: Sensitivity-Guided Dormant Score