Revisiting Data Augmentation in Deep Reinforcement Learning

Jianshu Hu; Yunpeng Jiang; Paul Weng

Revisiting Data Augmentation in Deep Reinforcement Learning

Jianshu Hu, Yunpeng Jiang, Paul Weng

TL;DR

This work analyzes data augmentation for image-based DRL through a variance-centric lens on $Q$-targets and actor/critic losses, and unifies existing methods under a principled off-policy actor–critic framework. It develops a generic algorithm that combines multiple image transformations with explicit KL regularization and tangent-prop to promote invariance, while carefully considering when to apply transforms in target calculations. The authors provide theoretical connections between explicit and implicit regularization, justify architectural choices, and demonstrate that the proposed approach achieves state-of-the-art sample efficiency and improved generalization in DeepMind Control tasks. The study emphasizes the importance of learning critic invariance and controlling variance through augmentation strategies, offering practical guidance for principled data augmentation in DRL and highlighting the trade-offs between target bias and variance. Overall, the paper advances data augmentation in DRL by tying theoretical insights to a robust, scalable algorithm with strong empirical results.

Abstract

Various data augmentation techniques have been recently proposed in image-based deep reinforcement learning (DRL). Although they empirically demonstrate the effectiveness of data augmentation for improving sample efficiency or generalization, which technique should be preferred is not always clear. To tackle this question, we analyze existing methods to better understand them and to uncover how they are connected. Notably, by expressing the variance of the Q-targets and that of the empirical actor/critic losses of these methods, we can analyze the effects of their different components and compare them. We furthermore formulate an explanation about how these methods may be affected by choosing different data augmentation transformations in calculating the target Q-values. This analysis suggests recommendations on how to exploit data augmentation in a more principled way. In addition, we include a regularization term called tangent prop, previously proposed in computer vision, but whose adaptation to DRL is novel to the best of our knowledge. We evaluate our proposition and validate our analysis in several domains. Compared to different relevant baselines, we demonstrate that it achieves state-of-the-art performance in most environments and shows higher sample efficiency and better generalization ability in some complex environments.

Revisiting Data Augmentation in Deep Reinforcement Learning

TL;DR

This work analyzes data augmentation for image-based DRL through a variance-centric lens on

-targets and actor/critic losses, and unifies existing methods under a principled off-policy actor–critic framework. It develops a generic algorithm that combines multiple image transformations with explicit KL regularization and tangent-prop to promote invariance, while carefully considering when to apply transforms in target calculations. The authors provide theoretical connections between explicit and implicit regularization, justify architectural choices, and demonstrate that the proposed approach achieves state-of-the-art sample efficiency and improved generalization in DeepMind Control tasks. The study emphasizes the importance of learning critic invariance and controlling variance through augmentation strategies, offering practical guidance for principled data augmentation in DRL and highlighting the trade-offs between target bias and variance. Overall, the paper advances data augmentation in DRL by tying theoretical insights to a robust, scalable algorithm with strong empirical results.

Abstract

Paper Structure (67 sections, 4 theorems, 95 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 67 sections, 4 theorems, 95 equations, 14 figures, 8 tables, 1 algorithm.

Introduction
Contributions:
Related Work
Background
Notations
Invariant Transformations
Explicit regularization
Implicit regularization
Theoretical Discussion
Critic Loss
Actor Loss
Principled Data-Augmented Off-Policy Actor-Critic Algorithm
Generic Algorithm
Justification for the Generic Algorithm
Applying image transformations in calculating the target
...and 52 more sections

Key Result

Lemma 1

(appendix:exp vs imp critic connection)All detailed derivations/proofs are in the appendix. The appendix number is provided for ease of reference. There exist distributions for $\hat{\nu}$ and $\hat{\mu}$ such that we have for any sample $(s, a, r, s')$:

Figures (14)

Figure 1: Aggregated results of validating our propositions.
Figure 2: Performance of different methods in walker run environment.
Figure 2: Comparison of SVEA and ours in DMControl with video-hard backgrounds for training 500k env steps.
Figure 3: Partial results of evaluating sample efficiency for our methods.
Figure 4: Performance of increasing the number of updates with/without using random conv in calculating the targets.
...and 9 more figures

Theorems & Definitions (10)

Definition 1: $\pi$-invariance
Definition 2: $Q$-invariance
Lemma 1
Proposition 4.1
Proposition 5.1
Proposition 5.2
proof
proof
proof
proof

Revisiting Data Augmentation in Deep Reinforcement Learning

TL;DR

Abstract

Revisiting Data Augmentation in Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (10)