Table of Contents
Fetching ...

Plasticity Loss in Deep Reinforcement Learning: A Survey

Timo Klein, Lukas Miklautz, Kevin Sidak, Claudia Plant, Sebastian Tschiatschek

TL;DR

This survey aims to provide an overview of the emerging research on plasticity loss for academics and practitioners of deep reinforcement learning, and proposes a unified definition of plasticity loss based on recent works, relate it to definitions from the literature, and discuss metrics for measuring plasticity loss.

Abstract

Akin to neuroplasticity in human brains, the plasticity of deep neural networks enables their quick adaption to new data. This makes plasticity particularly crucial for deep Reinforcement Learning (RL) agents: Once plasticity is lost, an agent's performance will inevitably plateau because it cannot improve its policy to account for changes in the data distribution, which are a necessary consequence of its learning process. Thus, developing well-performing and sample-efficient agents hinges on their ability to remain plastic during training. Furthermore, the loss of plasticity can be connected to many other issues plaguing deep RL, such as training instabilities, scaling failures, overestimation bias, and insufficient exploration. With this survey, we aim to provide an overview of the emerging research on plasticity loss for academics and practitioners of deep reinforcement learning. First, we propose a unified definition of plasticity loss based on recent works, relate it to definitions from the literature, and discuss metrics for measuring plasticity loss. Then, we categorize and discuss numerous possible causes of plasticity loss before reviewing currently employed mitigation strategies. Our taxonomy is the first systematic overview of the current state of the field. Lastly, we discuss prevalent issues within the literature, such as a necessity for broader evaluation, and provide recommendations for future research, like gaining a better understanding of an agent's neural activity and behavior.

Plasticity Loss in Deep Reinforcement Learning: A Survey

TL;DR

This survey aims to provide an overview of the emerging research on plasticity loss for academics and practitioners of deep reinforcement learning, and proposes a unified definition of plasticity loss based on recent works, relate it to definitions from the literature, and discuss metrics for measuring plasticity loss.

Abstract

Akin to neuroplasticity in human brains, the plasticity of deep neural networks enables their quick adaption to new data. This makes plasticity particularly crucial for deep Reinforcement Learning (RL) agents: Once plasticity is lost, an agent's performance will inevitably plateau because it cannot improve its policy to account for changes in the data distribution, which are a necessary consequence of its learning process. Thus, developing well-performing and sample-efficient agents hinges on their ability to remain plastic during training. Furthermore, the loss of plasticity can be connected to many other issues plaguing deep RL, such as training instabilities, scaling failures, overestimation bias, and insufficient exploration. With this survey, we aim to provide an overview of the emerging research on plasticity loss for academics and practitioners of deep reinforcement learning. First, we propose a unified definition of plasticity loss based on recent works, relate it to definitions from the literature, and discuss metrics for measuring plasticity loss. Then, we categorize and discuss numerous possible causes of plasticity loss before reviewing currently employed mitigation strategies. Our taxonomy is the first systematic overview of the current state of the field. Lastly, we discuss prevalent issues within the literature, such as a necessity for broader evaluation, and provide recommendations for future research, like gaining a better understanding of an agent's neural activity and behavior.

Paper Structure

This paper contains 88 sections, 36 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Gradient covariance structure at different time steps on Atari. (\ref{['fig:freeway_gradient_covariance_1M']}) For the Atari game Freeway, the gradient covariance matrix displays a pronounced structure at one million steps. (\ref{['fig:freeway_gradient_covariance_3M500k']}) Later in the training, the structure becomes less noticeable at 3.5M steps. (\ref{['fig:spaceinvaders_gradient_covariance_1M']}) and (\ref{['fig:spaceinvaders_gradient_covariance_3M500k']}) show the gradient covariance matrices for the game SpaceInvaders at the same time steps. Here, the structure is less pronounced. We can observe that the structure of the gradient covariances and their evolution depends on the particular game/environment.
  • Figure 2: Possible connections between factors and causes of plasticity loss in value-based RL. Large-mean regression targets combined with non-stationarity of deep RL training cause large and unstable gradients, leading to an increase in parameter norms. Large parameter norms are known to increase loss sharpness and cause other pathologies, together leading to reduced agent performance.
  • Figure 3: Visualization of categorical losses for deep RL. The two-hot representation TwoHotMuZero proportionally assigns probability mass to the two neighboring bins of a scalar target $y$. HL-Gauss HL-GaussOriginal constructs a Gaussian with fixed standard deviation and integrates over each bin to obtain the corresponding probability mass. Distribution RL algorithms such as C51 C51DistributionalRL model the full return distribution. Detailed descriptions of these methods are in Section \ref{['subsec:mitigation:loss_reformulation']}. Figure taken from StopRegressing.

Theorems & Definitions (3)

  • Definition 1: Effective Rank FeatureRankKumar
  • Definition 2: Feature rank InFeRUnderstandingCapacityLoss
  • Definition 3: Loss of plasticity