Table of Contents
Fetching ...

When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning

Claas Voelcker, Tyler Kastner, Igor Gilitschenski, Amir-massoud Farahmand

TL;DR

A theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning in the presence of distractions and observation functions under linear model assumptions is provided.

Abstract

We investigate the impact of auxiliary learning tasks such as observation reconstruction and latent self-prediction on the representation learning problem in reinforcement learning. We also study how they interact with distractions and observation functions in the MDP. We provide a theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning in the presence of distractions and observation functions under linear model assumptions. With this formalization, we are able to explain why latent-self prediction is a helpful \emph{auxiliary task}, while observation reconstruction can provide more useful features when used in isolation. Our empirical analysis shows that the insights obtained from our learning dynamics framework predicts the behavior of these loss functions beyond the linear model assumption in non-linear neural networks. This reinforces the usefulness of the linear model framework not only for theoretical analysis, but also practical benefit for applied problems.

When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning

TL;DR

A theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning in the presence of distractions and observation functions under linear model assumptions is provided.

Abstract

We investigate the impact of auxiliary learning tasks such as observation reconstruction and latent self-prediction on the representation learning problem in reinforcement learning. We also study how they interact with distractions and observation functions in the MDP. We provide a theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning in the presence of distractions and observation functions under linear model assumptions. With this formalization, we are able to explain why latent-self prediction is a helpful \emph{auxiliary task}, while observation reconstruction can provide more useful features when used in isolation. Our empirical analysis shows that the insights obtained from our learning dynamics framework predicts the behavior of these loss functions beyond the linear model assumption in non-linear neural networks. This reinforces the usefulness of the linear model framework not only for theoretical analysis, but also practical benefit for applied problems.

Paper Structure

This paper contains 25 sections, 27 theorems, 36 equations, 12 figures, 3 tables.

Key Result

Proposition 1

Assume assumption1 and assumption2 hold. Furthermore, suppose $P^\pi$ is real diagonalizable. If the columns of $\Phi_t$ span an invariant subspace of $P^\pi$, $\Phi_t$ is a stationary point of the dynamical system. Furthermore, if $P^\pi$ is real-diagonalizable with positive eigenvalues, all invari

Figures (12)

  • Figure 1: Diagram of the considered loss functions and the different use cases. In latent self-prediction, the aim is to predict next state features given by a embedding function $\phi(x')$ using the states features $\phi(x)$ and a latent prediction model $F$. In observation reconstruction, the aim is to match next state ground truth observations$x'$ via the use of a decoder function $\psi(F(\phi(x)))$. In the auxiliary task setup, both the gradients from the feature learning loss and value function learning are propagated to the encoder, while in the stand-alone scenario, only the gradients from the feature learning loss are used to update $\Phi$.
  • Figure 2: Auxiliary task setup: Performance of all losses on the observation space as given without changes to the environment.
  • Figure 3: Stand-alone setup: Performance of all losses on the observation space as given without changes to the environment. The DQN baseline is using random features, which are not updated, to verify that learning features is indeed superior to a random feature baseline.
  • Figure 4: Distorted observation function with a random transformation.
  • Figure 5: Appending random noise channels.
  • ...and 7 more figures

Theorems & Definitions (43)

  • Definition 1: A factored MDP model of a distraction
  • Proposition 1: Stationary points of latent self-prediction
  • Proposition 2: Stationary points of reconstruction
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • Proposition 7: Suboptimality of top $k$ eigenspaces with distractions
  • Definition 2: Top-$k$ singular vectors
  • Lemma 1: Spectrum of Kronecker product matrix
  • ...and 33 more