A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning
Abdulaziz Almuzairee, Nicklas Hansen, Henrik I. Christensen
TL;DR
This work addresses the fragility of visual Q-learning under strong data augmentations by identifying limitations in prior approaches that assume full embedding invariance to augmentations. It introduces SADA, a generalized, stabilized actor-critic augmentation recipe that applies augmentations to both actor and critic inputs in an asymmetric but training-efficient manner, enabling robust performance across geometric and photometric transformations. Through extensive experiments on DMControl, Meta-World, and the new DMC-GB2 benchmark, SADA demonstrates improved training stability and generalization, including superior geometric robustness, while preserving sample efficiency. The authors also show that SADA generalizes to other backbones like TD-MPC2 and provide open-source benchmarks and code to advance visual RL research.
Abstract
Q-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 - our proposed extension of the popular DMControl Generalization Benchmark - as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. For visualizations, code and benchmark: see https://aalmuzairee.github.io/SADA/
