Table of Contents
Fetching ...

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

Guozheng Ma, Zhen Wang, Zhecheng Yuan, Xueqian Wang, Bo Yuan, Dacheng Tao

TL;DR

The paper introduces HCMDP as a unified framework to formalize visual RL and identifies two key DA motivations: optimality invariance for sample efficiency and prior-based diversity for generalization. It provides a principled taxonomy of DA techniques across observation, transition, and trajectory augmentations, augmented by automatic, context-aware, and generative extensions. Through systematic benchmarks on Atari, DMControl, Procgen, and other suites, the survey demonstrates that DA can substantially improve sample efficiency and zero-shot generalization, with methods like DrQ, SPR, CURL, DRIBO, SVEA, and PlayVirtual exemplifying the range of successful strategies. The discussion emphasizes semantic-level DA, stability-generalization trade-offs, and the unique role of DA in mitigating plasticity loss in visual RL, while cautioning about limitations and the evolving influence of foundation models on future research directions.

Abstract

Visual reinforcement learning (RL), which makes decisions directly from high-dimensional visual inputs, has demonstrated significant potential in various domains. However, deploying visual RL techniques in the real world remains challenging due to their low sample efficiency and large generalization gaps. To tackle these obstacles, data augmentation (DA) has become a widely used technique in visual RL for acquiring sample-efficient and generalizable policies by diversifying the training data. This survey aims to provide a timely and essential review of DA techniques in visual RL in recognition of the thriving development in this field. In particular, we propose a unified framework for analyzing visual RL and understanding the role of DA in it. We then present a principled taxonomy of the existing augmentation techniques used in visual RL and conduct an in-depth discussion on how to better leverage augmented data in different scenarios. Moreover, we report a systematic empirical evaluation of DA-based techniques in visual RL and conclude by highlighting the directions for future research. As the first comprehensive survey of DA in visual RL, this work is expected to offer valuable guidance to this emerging field.

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

TL;DR

The paper introduces HCMDP as a unified framework to formalize visual RL and identifies two key DA motivations: optimality invariance for sample efficiency and prior-based diversity for generalization. It provides a principled taxonomy of DA techniques across observation, transition, and trajectory augmentations, augmented by automatic, context-aware, and generative extensions. Through systematic benchmarks on Atari, DMControl, Procgen, and other suites, the survey demonstrates that DA can substantially improve sample efficiency and zero-shot generalization, with methods like DrQ, SPR, CURL, DRIBO, SVEA, and PlayVirtual exemplifying the range of successful strategies. The discussion emphasizes semantic-level DA, stability-generalization trade-offs, and the unique role of DA in mitigating plasticity loss in visual RL, while cautioning about limitations and the evolving influence of foundation models on future research directions.

Abstract

Visual reinforcement learning (RL), which makes decisions directly from high-dimensional visual inputs, has demonstrated significant potential in various domains. However, deploying visual RL techniques in the real world remains challenging due to their low sample efficiency and large generalization gaps. To tackle these obstacles, data augmentation (DA) has become a widely used technique in visual RL for acquiring sample-efficient and generalizable policies by diversifying the training data. This survey aims to provide a timely and essential review of DA techniques in visual RL in recognition of the thriving development in this field. In particular, we propose a unified framework for analyzing visual RL and understanding the role of DA in it. We then present a principled taxonomy of the existing augmentation techniques used in visual RL and conduct an in-depth discussion on how to better leverage augmented data in different scenarios. Moreover, we report a systematic empirical evaluation of DA-based techniques in visual RL and conclude by highlighting the directions for future research. As the first comprehensive survey of DA in visual RL, this work is expected to offer valuable guidance to this emerging field.
Paper Structure (72 sections, 21 equations, 22 figures, 5 tables)

This paper contains 72 sections, 21 equations, 22 figures, 5 tables.

Figures (22)

  • Figure 1: The generic workflow diagram for leveraging DA in visual RL.
  • Figure 2: The schematic structure of this survey.
  • Figure 3: The agent-environment interaction loop of visual RL and an example of HCMDP.
  • Figure 4: A graphical model of the emission function of a HCMDP (a) compared with three other representative MDP variants: (b) $(f, g)$-scheme observational_overfitting, (c) Block MDP causal_for_block_mdps and (d) BC-MDP BC-MDP.
  • Figure 5: Optimality-Invariant augmentation.
  • ...and 17 more figures