Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning
Hongjoon Ahn, Jinu Hyeon, Youngmin Oh, Bosun Hwang, Taesup Moon
TL;DR
This paper identifies negative transfer as a pervasive obstacle in continual reinforcement learning and demonstrates that common plasticity-loss remedies do not reliably fix it. It introduces Reset & Distill (R&D), a simple two-network baseline that resets the online learner per task and distills knowledge into an offline learner to both erase detrimental prior knowledge and prevent forgetting. Through extensive experiments on Meta World, DeepMind Control Suite, and Atari-100k, R&D consistently outperforms baselines like EWC, P&C, and ClonEx, especially in long task sequences with frequent negative transfer. The results underscore the need to prioritize mitigating negative transfer in CRL, illustrating that robust strategies like R&D can yield substantial gains in transfer safety and sequential task performance.
Abstract
We argue that the negative transfer problem occurring when the new task to learn arrives is an important problem that needs not be overlooked when developing effective Continual Reinforcement Learning (CRL) algorithms. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on either mitigating plasticity loss of RL agents or enhancing the positive transfer in CRL scenario. To that end, we develop Reset & Distill (R&D), a simple yet highly effective baseline method, to overcome the negative transfer problem in CRL. R&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. We carried out extensive experiments on long sequence of Meta World tasks and show that our simple baseline method consistently outperforms recent approaches, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R&D to mitigate its detrimental effects.
