Measuring Visual Generalization in Continuous Control from Pixels
Jake Grigsby, Yanjun Qi
TL;DR
This paper tackles visual generalization in pixel-based continuous control by introducing DMCR, a benchmark that injects wide visual variation into the DeepMind Control Suite via visual seeds while keeping dynamics fixed. It shows that state-of-the-art representation learning methods struggle to generalize across diverse visuals, whereas data augmentation—especially color-altering transforms—substantially improves generalization, with Network Randomization achieving near-perfect cross-visual performance in some tasks. The work also analyzes which visual factors are most challenging and provides an encoder-regularization strategy to promote augmentation invariance, offering practical guidance for building perceptually robust image-based controllers. Overall, DMCR provides a reproducible, scalable platform for evaluating and advancing visual generalization in real-world-like continuous control settings.
Abstract
Self-supervised learning and data augmentation have significantly reduced the performance gap between state and image-based reinforcement learning agents in continuous control tasks. However, it is still unclear whether current techniques can face a variety of visual conditions required by real-world environments. We propose a challenging benchmark that tests agents' visual generalization by adding graphical variety to existing continuous control domains. Our empirical analysis shows that current methods struggle to generalize across a diverse set of visual changes, and we examine the specific factors of variation that make these tasks difficult. We find that data augmentation techniques outperform self-supervised learning approaches and that more significant image transformations provide better visual generalization \footnote{The benchmark and our augmented actor-critic implementation are open-sourced @ https://github.com/QData/dmc_remastered)
