Synthetic Experience Replay

Cong Lu; Philip J. Ball; Yee Whye Teh; Jack Parker-Holder

Synthetic Experience Replay

Cong Lu, Philip J. Ball, Yee Whye Teh, Jack Parker-Holder

TL;DR

<3-5 sentence high-level summary> SynthER introduces a diffusion-based method to upsample an RL agent's replay buffer by generating synthetic transitions, addressing data scarcity in offline RL and improving sample efficiency online without algorithmic changes. The approach demonstrates parity or gains across diverse proprioceptive and pixel-based tasks, enables training with much smaller offline datasets and larger networks, and scales to latent representations for imagery. Offline results show faithful distribution modeling and compression benefits, while online results reveal substantial increases in update-to-data efficiency with competitive runtime. The work suggests synthetic training data, via diffusion, can unlock the full potential of replay-based RL under limited data and provides open-source code for community use.

Abstract

A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is commonly made possible through experience replay, whereby a dataset of past experiences is used to train a policy or value function. However, unlike in supervised or self-supervised learning, an RL agent has to collect its own data, which is often limited. Thus, it is challenging to reap the benefits of deep learning, and even small neural networks can overfit at the start of training. In this work, we leverage the tremendous recent progress in generative modeling and propose Synthetic Experience Replay (SynthER), a diffusion-based approach to flexibly upsample an agent's collected experience. We show that SynthER is an effective method for training RL agents across offline and online settings, in both proprioceptive and pixel-based environments. In offline settings, we observe drastic improvements when upsampling small offline datasets and see that additional synthetic data also allows us to effectively train larger networks. Furthermore, SynthER enables online agents to train with a much higher update-to-data ratio than before, leading to a significant increase in sample efficiency, without any algorithmic changes. We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data. Finally, we open-source our code at https://github.com/conglu1997/SynthER.

Synthetic Experience Replay

TL;DR

Abstract

Paper Structure (34 sections, 4 equations, 8 figures, 13 tables, 1 algorithm)

This paper contains 34 sections, 4 equations, 8 figures, 13 tables, 1 algorithm.

Introduction
Background
Reinforcement Learning
Offline Reinforcement Learning
Diffusion Models
Synthetic Experience Replay
Offline SynthER
Online SynthER
Empirical Evaluation
Offline Evaluation
Upsampling for Small Datasets
Why is SynthER better than explicit augmentation?
Scaling Network Size
Online Evaluation
Scaling to Pixel-Based Observations
...and 19 more sections

Figures (8)

Figure 1: Upsampling data using SynthER greatly outperforms explicit data augmentation schemes for small offline datasets and data-efficient algorithms in online RL without any algorithmic changes. Moreover, synthetic data from SynthER may readily be added to any algorithm utilizing experience replay. Full results in \ref{['sec:eval']}.
Figure 2: SynthER generates synthetic samples using a diffusion model which we visualize on the proprioceptive walker2d environment. On the top row, we render the state component of the transition tuple on a subset of samples; and on the bottom row, we visualize a t-SNE tsne projection of 100,000 samples. The denoising process creates cohesive and plausible transitions whilst also remaining diverse, as seen by the multiple clusters that form at the end of the process in the bottom row.
Figure 3: SynthER is a powerful method for upsampling reduced variants of the walker2d datasets and vastly improves on competitive explicit data augmentation approaches for both the TD3+BC (top) and IQL (bottom) algorithms. The subsampling levels are scaled proportionally to the original size of each dataset. We show the mean and standard deviation of the final performance averaged over 8 seeds.
Figure 4: Comparing L2 distance from training data and dynamics accuracy under SynthER and augmentations.
Figure 5: SynthER greatly improves the sample efficiency of online RL algorithms by enabling an agent to train on upsampled data. This allows an agent to use an increased update-to-data ratio (UTD=20 compared to 1 for regular SAC) without any algorithmic changes. We show the mean and standard deviation of the online return over 6 seeds. DeepMind Control Suite environments are shown in the top row, and OpenAI Gym environments are shown in the bottom.
...and 3 more figures

Synthetic Experience Replay

TL;DR

Abstract

Synthetic Experience Replay

Authors

TL;DR

Abstract

Table of Contents

Figures (8)