Mastering Continual Reinforcement Learning through Fine-Grained Sparse Network Allocation and Dormant Neuron Exploration
Chengqi Zheng, Haiyan Yin, Jianda Chen, Terence Ng, Yew-Soon Ong, Ivor Tsang
TL;DR
SSDE tackles the plasticity-stability dilemma in continual reinforcement learning by partitioning policy parameters into forward-transfer (frozen) and task-specific (trainable) sub-networks via a coarse-to-fine allocation scheme. It combines a co-allocation mechanism with sparse prompts (global and local) to preemptively assign forward-transfer capacity and preserve sufficient trainable capacity, and employs fine-grained masking to enable targeted updates. To counteract expressivity loss in sparse networks, SSDE introduces sensitivity-guided dormant neuron exploration, periodically resetting dormant neurons to boost exploration and adaptation. On CW10-v1, SSDE achieves state-of-the-art stability (95% success) with competitive plasticity, and across CW10-v2 and CW20-v1/v2, it demonstrates robust, scalable performance with efficient sub-network allocation and interpretable mask structures.
Abstract
Continual Reinforcement Learning (CRL) is essential for developing agents that can learn, adapt, and accumulate knowledge over time. However, a fundamental challenge persists as agents must strike a delicate balance between plasticity, which enables rapid skill acquisition, and stability, which ensures long-term knowledge retention while preventing catastrophic forgetting. In this paper, we introduce SSDE, a novel structure-based approach that enhances plasticity through a fine-grained allocation strategy with Structured Sparsity and Dormant-guided Exploration. SSDE decomposes the parameter space into forward-transfer (frozen) parameters and task-specific (trainable) parameters. Crucially, these parameters are allocated by an efficient co-allocation scheme under sparse coding, ensuring sufficient trainable capacity for new tasks while promoting efficient forward transfer through frozen parameters. However, structure-based methods often suffer from rigidity due to the accumulation of non-trainable parameters, limiting exploration and adaptability. To address this, we further introduce a sensitivity-guided neuron reactivation mechanism that systematically identifies and resets dormant neurons, which exhibit minimal influence in the sparse policy network during inference. This approach effectively enhance exploration while preserving structural efficiency. Extensive experiments on the CW10-v1 Continual World benchmark demonstrate that SSDE achieves state-of-the-art performance, reaching a success rate of 95%, surpassing prior methods significantly in both plasticity and stability trade-offs (code is available at: https://github.com/chengqiArchy/SSDE).
