Table of Contents
Fetching ...

Do Agents Dream of Electric Sheep?: Improving Generalization in Reinforcement Learning through Generative Learning

Giorgio Franceschelli, Mirco Musolesi

TL;DR

This work uses imagination-based reinforcement learning to train a policy on dream-like episodes, where non-imaginative, predicted trajectories are modified through generative augmentations, and shows the method can reach a higher level of generalization when dealing with sparsely rewarded environments.

Abstract

The Overfitted Brain hypothesis suggests dreams happen to allow generalization in the human brain. Here, we ask if the same is true for reinforcement learning agents as well. Given limited experience in a real environment, we use imagination-based reinforcement learning to train a policy on dream-like episodes, where non-imaginative, predicted trajectories are modified through generative augmentations. Experiments on four ProcGen environments show that, compared to classic imagination and offline training on collected experience, our method can reach a higher level of generalization when dealing with sparsely rewarded environments.

Do Agents Dream of Electric Sheep?: Improving Generalization in Reinforcement Learning through Generative Learning

TL;DR

This work uses imagination-based reinforcement learning to train a policy on dream-like episodes, where non-imaginative, predicted trajectories are modified through generative augmentations, and shows the method can reach a higher level of generalization when dealing with sparsely rewarded environments.

Abstract

The Overfitted Brain hypothesis suggests dreams happen to allow generalization in the human brain. Here, we ask if the same is true for reinforcement learning agents as well. Given limited experience in a real environment, we use imagination-based reinforcement learning to train a policy on dream-like episodes, where non-imaginative, predicted trajectories are modified through generative augmentations. Experiments on four ProcGen environments show that, compared to classic imagination and offline training on collected experience, our method can reach a higher level of generalization when dealing with sparsely rewarded environments.
Paper Structure (15 sections, 7 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 7 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: At imagination time, we start from a randomlatentstate and then we only leverage the predicting capabilities of our world model to obtain future latent states (the concatenation of a discrete latent vector and a recurrent hidden state), rewards and termination bits given the actions from the agent. To introduce a dream-like transformation, we modify the current latent state with a small probability by doing one of three operations: interpolate it with randomnoise; DeepDream its corresponding observation from the decoder by maximizing the activation of the encoder last convolution layer; optimize it to maximize the absolute value of critic output.
  • Figure 2: An example of the three generative augmentations on a state from Plunder environment.
  • Figure 3: Total rewards received on all possible levels by classic Dreamer varying the source of initial states for imagination (randomly generated or collected from real environments). The vertical line separates the day training (common to all methods) from the night training. Results report average and confidence intervals across 5 seeds.
  • Figure 4: Total rewards received on all possible levels by our variants and by the two baselines. The vertical line separates the day training (common to all methods) from the night training. Results report average and confidence intervals across 5 seeds.
  • Figure 5: Total rewards received on all the levels by our variants considering the transformations separately and together with random uniform probability. The vertical line separates the day training (common to all methods) from the night training. Results report average and confidence intervals across 5 seeds.