Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Hongjoon Ahn; Jinu Hyeon; Youngmin Oh; Bosun Hwang; Taesup Moon

Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Hongjoon Ahn, Jinu Hyeon, Youngmin Oh, Bosun Hwang, Taesup Moon

TL;DR

This paper identifies negative transfer as a pervasive obstacle in continual reinforcement learning and demonstrates that common plasticity-loss remedies do not reliably fix it. It introduces Reset & Distill (R&D), a simple two-network baseline that resets the online learner per task and distills knowledge into an offline learner to both erase detrimental prior knowledge and prevent forgetting. Through extensive experiments on Meta World, DeepMind Control Suite, and Atari-100k, R&D consistently outperforms baselines like EWC, P&C, and ClonEx, especially in long task sequences with frequent negative transfer. The results underscore the need to prioritize mitigating negative transfer in CRL, illustrating that robust strategies like R&D can yield substantial gains in transfer safety and sequential task performance.

Abstract

We argue that the negative transfer problem occurring when the new task to learn arrives is an important problem that needs not be overlooked when developing effective Continual Reinforcement Learning (CRL) algorithms. Through comprehensive experimental validation, we demonstrate that such issue frequently exists in CRL and cannot be effectively addressed by several recent work on either mitigating plasticity loss of RL agents or enhancing the positive transfer in CRL scenario. To that end, we develop Reset & Distill (R&D), a simple yet highly effective baseline method, to overcome the negative transfer problem in CRL. R&D combines a strategy of resetting the agent's online actor and critic networks to learn a new task and an offline learning step for distilling the knowledge from the online actor and previous expert's action probabilities. We carried out extensive experiments on long sequence of Meta World tasks and show that our simple baseline method consistently outperforms recent approaches, achieving significantly higher success rates across a range of tasks. Our findings highlight the importance of considering negative transfer in CRL and emphasize the need for robust strategies like R&D to mitigate its detrimental effects.

Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

TL;DR

Abstract

Paper Structure (33 sections, 3 equations, 19 figures, 4 tables, 1 algorithm)

This paper contains 33 sections, 3 equations, 19 figures, 4 tables, 1 algorithm.

Introduction
Background
Preliminaries
Loss of plasticity in RL
Negative transfer in transfer learning
The negative transfer in CRL
A motivating experiment
Identifying various levels of negative transfers
A simple baseline for addressing the negative transfer in CRL
Experimental evaluation
Two-task fine-tuning experiments with various methods
Evaluation on long sequence of tasks
Analyses on negative transfer and forgetting
Conclusion
Algorithm
...and 18 more sections

Figures (19)

Figure 1: The success rates of SAC and PPO on (a) push-wall and (b) window-close tasks.
Figure 2: Results on continual fine-tuning SAC (top) and PPO (bottom) on 3 tasks. (a) Success rates with various methods. (b) Various indicators of the plasticity loss of the models across the three tasks.
Figure 3: Results of the 3-task experiment with P&C variants, utilizing SAC.
Figure 4: Negative transfer patterns for the two-task fine-tuning in Meta World with (a) SAC and (b) PPO, when tasks from Plate, Push and Sweep groups are learned as the first (left) or the second (right) task.
Figure 5: Negative transfer patterns in DeepMind Control environment with SAC.
...and 14 more figures

Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

TL;DR

Abstract

Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (19)