Table of Contents
Fetching ...

A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

Abdulaziz Almuzairee, Nicklas Hansen, Henrik I. Christensen

TL;DR

This work addresses the fragility of visual Q-learning under strong data augmentations by identifying limitations in prior approaches that assume full embedding invariance to augmentations. It introduces SADA, a generalized, stabilized actor-critic augmentation recipe that applies augmentations to both actor and critic inputs in an asymmetric but training-efficient manner, enabling robust performance across geometric and photometric transformations. Through extensive experiments on DMControl, Meta-World, and the new DMC-GB2 benchmark, SADA demonstrates improved training stability and generalization, including superior geometric robustness, while preserving sample efficiency. The authors also show that SADA generalizes to other backbones like TD-MPC2 and provide open-source benchmarks and code to advance visual RL research.

Abstract

Q-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 - our proposed extension of the popular DMControl Generalization Benchmark - as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. For visualizations, code and benchmark: see https://aalmuzairee.github.io/SADA/

A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

TL;DR

This work addresses the fragility of visual Q-learning under strong data augmentations by identifying limitations in prior approaches that assume full embedding invariance to augmentations. It introduces SADA, a generalized, stabilized actor-critic augmentation recipe that applies augmentations to both actor and critic inputs in an asymmetric but training-efficient manner, enabling robust performance across geometric and photometric transformations. Through extensive experiments on DMControl, Meta-World, and the new DMC-GB2 benchmark, SADA demonstrates improved training stability and generalization, including superior geometric robustness, while preserving sample efficiency. The authors also show that SADA generalizes to other backbones like TD-MPC2 and provide open-source benchmarks and code to advance visual RL research.

Abstract

Q-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 - our proposed extension of the popular DMControl Generalization Benchmark - as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. For visualizations, code and benchmark: see https://aalmuzairee.github.io/SADA/
Paper Structure (27 sections, 7 equations, 61 figures, 1 table, 1 algorithm)

This paper contains 27 sections, 7 equations, 61 figures, 1 table, 1 algorithm.

Figures (61)

  • Figure 1: Augmentation Effect on CNN Output. We illustrate how the output embedding of a trained CNN changes wrt. image augmentations. The output of unaugmented and photometrically augmented images are identical due to the ability of a CNN to learn color invariances. However, the output of a CNN is generally not invariant to geometric augmentations (e.g., rotation).
  • Figure 2: Our approach. Overview of SADA applied to a generic actor-critic algorithm. We highlight our algorithmic contributions in yellow. SADA selectively applies augmentations to the actor and critic inputs, and modifies the learning objectives accordingly.
  • Figure 3: Overall Robustness.(Top) Samples from the DMC-GB2 test distributions, divided into geometric and photometric test sets. (Bottom) Episode reward on DMC-GB2 when trained under all (geometric and photometric) augmentations, averaged across all DMControl tasks. Mean and 95% CI over 5 seeds.
  • Figure 4: Geometric vs Photometric Robustness. Episode reward averaged over all DMControl tasks. (Top) Trained under geometric augmentations and evaluated on DMC-GB2 geometric test set. (Bottom) Trained under photometric augmentations and evaluated on DMC-GB2 photometric test set. All hard levels visualized. Mean and 95% CI over 5 random seeds.
  • Figure 5: Actor Prediction Variance. Actor prediction variance between augmented and unaugmented observations. $\mathbf{\downarrow}$ Lower is better.
  • ...and 56 more figures