CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning
John Birkbeck, Adam Sobey, Federico Cerutti, Katherine Heseltine Hurley Flynn, Timothy J. Norman
TL;DR
This work tackles the challenge of predicting how environmental changes affect reinforcement learning performance by introducing Change-Induced Regret Proxies (CHIRPs). It formalizes Scaled Optimal Policy Regret (SOPR) and proposes using the $W_1$-MDP distance as a practical CHIRP to proxy SOPR from transition samples. Through experiments in SimpleGrid and MetaWorld, it demonstrates a positive, monotonic relationship between CHIRP values and SOPR, and shows how CHIRP-driven policy reuse (CPR) can markedly outperform existing lifelong RL methods, including in interleaved-task settings. Calibration via spline fitting further enables cross-environment comparisons, offering a scalable framework for predicting and mitigating change impact in lifelong reinforcement learning with substantial practical implications for real-world adaptability.
Abstract
Reinforcement learning (RL) agents are costly to train and fragile to environmental changes. They often perform poorly when there are many changing tasks, prohibiting their widespread deployment in the real world. Many Lifelong RL agent designs have been proposed to mitigate issues such as catastrophic forgetting or demonstrate positive characteristics like forward transfer when change occurs. However, no prior work has established whether the impact on agent performance can be predicted from the change itself. Understanding this relationship will help agents proactively mitigate a change's impact for improved learning performance. We propose Change-Induced Regret Proxy (CHIRP) metrics to link change to agent performance drops and use two environments to demonstrate a CHIRP's utility in lifelong learning. A simple CHIRP-based agent achieved $48\%$ higher performance than the next best method in one benchmark and attained the best success rates in 8 of 10 tasks in a second benchmark which proved difficult for existing lifelong RL agents.
