Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

Aneesh Muppidi; Zhiyu Zhang; Heng Yang

Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

Aneesh Muppidi, Zhiyu Zhang, Heng Yang

TL;DR

A parameter-free optimizer for lifelong RL, called TRAC, which requires no tuning or prior knowledge about the distribution shifts, and works surprisingly wellmitigating loss of plasticity and rapidly adapting to challenging distribution shifts-despite the underlying optimization problem being nonconvex and nonstationary.

Abstract

A key challenge in lifelong reinforcement learning (RL) is the loss of plasticity, where previous learning progress hinders an agent's adaptation to new tasks. While regularization and resetting can help, they require precise hyperparameter selection at the outset and environment-dependent adjustments. Building on the principled theory of online convex optimization, we present a parameter-free optimizer for lifelong RL, called TRAC, which requires no tuning or prior knowledge about the distribution shifts. Extensive experiments on Procgen, Atari, and Gym Control environments show that TRAC works surprisingly well-mitigating loss of plasticity and rapidly adapting to challenging distribution shifts-despite the underlying optimization problem being nonconvex and nonstationary.

Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

TL;DR

Abstract

Paper Structure (28 sections, 2 equations, 19 figures, 4 tables)

This paper contains 28 sections, 2 equations, 19 figures, 4 tables.

Introduction
Contribution
Organization
Lifelong RL
Lifelong RL as online optimization
Lifelong vs. Continual
Method
Basics of (parameter-free) OCO
Connection to regularization
On the hyperparameters
Experiment
Discussion
Conclusion
Acknowledgments
Trac Encourages Positive Transfer
...and 13 more sections

Figures (19)

Figure 1: Severe loss of plasticity in Procgen (Starpilot). There is a steady decline in reward with each distribution shift.
Figure 2: Visualization of Trac's key idea.
Figure 3: Experimental setup for lifelong RL.
Figure 4: Reward in the lifelong Procgen environments for StarPilot, Dodgeball, Fruitbot, and Chaser. There is a steady loss of plasticity in agents using Adam PPO and CReLU, characterized by their inability to maintain performance through succesive Procgen levels. In contrast, Trac avoids this loss of plasticity, quickly achieving high performance with each new task.
Figure 5: Reward in the lifelong Atari environments, across games with action spaces of 6 and 9. Trac PPO rapidly adapts to new tasks, in contrast to the Adam PPO and CReLU which struggle to achieve high reward, indicating mild loss of plasticity.
...and 14 more figures

Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

TL;DR

Abstract

Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (19)