Table of Contents
Fetching ...

Towards Continual Reinforcement Learning: A Review and Perspectives

Khimya Khetarpal, Matthew Riemer, Irina Rish, Doina Precup

TL;DR

The paper surveys continual reinforcement learning by formalizing non-stationarity through a two-axis taxonomy (scope and driver) and offering a unifying view that generalizes existing CRL formulations. It categorizes methods into explicit knowledge retention, shared-structure approaches, and meta-learning, detailing representative techniques like rehearsal, distillation, modular architectures, state abstractions, goals, and auxiliary tasks. It also covers evaluation practices, benchmarks, and robust metrics for forward/backward transfer and skill reuse, arguing for richer, principled CRL benchmarks. Finally, it discusses neuroscience-inspired directions and open problems needed to bridge the gap between current CRL methods and real-world deployment.

Abstract

In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations by mathematically characterizing two key properties of non-stationarity, namely, the scope and driver non-stationarity. This offers a unified view of various formulations. Next, we review and present a taxonomy of continual RL approaches. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.

Towards Continual Reinforcement Learning: A Review and Perspectives

TL;DR

The paper surveys continual reinforcement learning by formalizing non-stationarity through a two-axis taxonomy (scope and driver) and offering a unifying view that generalizes existing CRL formulations. It categorizes methods into explicit knowledge retention, shared-structure approaches, and meta-learning, detailing representative techniques like rehearsal, distillation, modular architectures, state abstractions, goals, and auxiliary tasks. It also covers evaluation practices, benchmarks, and robust metrics for forward/backward transfer and skill reuse, arguing for richer, principled CRL benchmarks. Finally, it discusses neuroscience-inspired directions and open problems needed to bridge the gap between current CRL methods and real-world deployment.

Abstract

In this article, we aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We begin by discussing our perspective on why RL is a natural fit for studying continual learning. We then provide a taxonomy of different continual RL formulations by mathematically characterizing two key properties of non-stationarity, namely, the scope and driver non-stationarity. This offers a unified view of various formulations. Next, we review and present a taxonomy of continual RL approaches. We go on to discuss evaluation of continual RL agents, providing an overview of benchmarks used in the literature and important metrics for understanding agent performance. Finally, we highlight open problems and challenges in bridging the gap between the current state of continual RL and findings in neuroscience. While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners that can function in increasingly realistic applications where non-stationarity plays a vital role. These include applications such as those in the fields of healthcare, education, logistics, and robotics.

Paper Structure

This paper contains 62 sections, 4 theorems, 21 equations, 9 figures.

Key Result

Proposition 1

A non-stationary MDP is a special type of CRL problem where $\alpha \subseteq \{ \mathcal{S}, \mathcal{A}, r, p \}$, the observation function is an appropriate identity matrix $x = \mathbb{I}$, and the observation space is the state space $\mathcal{O} = \mathcal{S}$.

Figures (9)

  • Figure 1: Agent-Environment Interaction with Potentially Time Dependent Environment Components. Extending Figure 3.1 of Sutton98 to highlight the agent-environment interaction in continual reinforcement learning.
  • Figure 2: A Spectrum of Learning Settings: For each setting we consider whether they typically involve multiple domains, multiple skills, online learning, resource efficiency/sustainability and a non-stationary evolution of the task distribution.
  • Figure 3: Reinforcement Learning and the Stability-Plasticity Dilemma: A) Depicts the stability-plasticity dilemma and its relation to both weight sharing and transfer dynamics over time (from MER). B) Depicts the forward view of RL where we evaluate the current state based on expected future rewards (from Sutton98). C) Depicts the backward view of RL where we leverage recent states and rewards to correct our evaluations of past states (from Sutton98).
  • Figure 4: Taxonomy of Continual RL Formalisms: Problem formulations in continual RL can be categorized along two primary dimensions: 1) the scope of the non-stationarity $\alpha$ and 2) the driver of non-stationarity $\beta$. Coupled with the scope and the driver of the non-stationarity, assumptions about the non-stationary functional forms ($f$) and shared structure ($a$) can result in different CRL formulations (Propositions \ref{['prop:crl-as-nonstationaryMDP']}, \ref{['prop:nonstationaryMDP-POMDP-duality']}, and \ref{['prop:activemarkovgame']}). This view offers a unified perspective resulting in continual reinforcement learning as a strict generalization of most of the existing formulations in the literature (Proposition \ref{['prof:crl-as-strict-gen']}).
  • Figure 5: Taxonomy of Continual RL Approaches: A diagram illustrating different clusters of approaches for continual RL, highlighting prominent threads of research within each family. Though these categories are not mutually exclusive, we examine each separately for the purpose of this paper.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Proposition 1: Non-stationary MDPs as CRL Problems
  • Proposition 2: Non-stationary MDP and POMDP Duality
  • Proposition 3: Active Markov Games as CRL Problems
  • Proposition 4: CRL as Strict Generalization