DisCoRL: Continual Reinforcement Learning via Policy Distillation
René Traoré, Hugo Caselles-Dupré, Timothée Lesort, Te Sun, Guanghang Cai, Natalia Díaz-Rodríguez, David Filliat
TL;DR
DisCoRL tackles continual reinforcement learning by combining state representation learning with policy distillation to sequentially acquire policies and consolidate them into a single robust policy without task labels. The method learns task-specific SRL encoders, trains policies in the learned representation space, and distills them into a memory-efficient, unified policy using soft labels. It demonstrates near-teacher performance across three sequential tasks in simulation and transfers effectively to a real robot, addressing sim-to-real gaps via domain randomization and robust SRL. The work offers a practical, scalable approach to continual RL in robotics and identifies avenues for refining SRL updates and memory efficiency.
Abstract
In multi-task reinforcement learning there are two main challenges: at training time, the ability to learn different policies with a single model; at test time, inferring which of those policies applying without an external signal. In the case of continual reinforcement learning a third challenge arises: learning tasks sequentially without forgetting the previous ones. In this paper, we tackle these challenges by proposing DisCoRL, an approach combining state representation learning and policy distillation. We experiment on a sequence of three simulated 2D navigation tasks with a 3 wheel omni-directional robot. Moreover, we tested our approach's robustness by transferring the final policy into a real life setting. The policy can solve all tasks and automatically infer which one to run.
