Table of Contents
Fetching ...

The Distracting Control Suite -- A Challenging Benchmark for Reinforcement Learning from Pixels

Austin Stone, Oscar Ramirez, Kurt Konolige, Rico Jonschkowski

TL;DR

This work introduces the Distracting Control Suite, an extension of the DM Control benchmark that injects three visual distractions—camera pose shifts, object color changes, and background video variations—in static and dynamic modes with tunable difficulty. It systematically evaluates state-of-the-art pixel-based RL methods (SAC and QT-Opt) with RAD and DrQ augmentations, revealing that distractions, especially when combined, significantly degrade performance and that training with distractions yields limited robustness gains. The experiments show that background distractions are particularly challenging and that method rankings shift under distraction, with QT-Opt variants often outperforming SAC in many distracted settings. The authors provide actionable insights and a publicly available benchmark to drive future development of robust vision-based control for real-world robotics.

Abstract

Robots have to face challenging perceptual settings, including changes in viewpoint, lighting, and background. Current simulated reinforcement learning (RL) benchmarks such as DM Control provide visual input without such complexity, which limits the transfer of well-performing methods to the real world. In this paper, we extend DM Control with three kinds of visual distractions (variations in background, color, and camera pose) to produce a new challenging benchmark for vision-based control, and we analyze state of the art RL algorithms in these settings. Our experiments show that current RL methods for vision-based control perform poorly under distractions, and that their performance decreases with increasing distraction complexity, showing that new methods are needed to cope with the visual complexities of the real world. We also find that combinations of multiple distraction types are more difficult than a mere combination of their individual effects.

The Distracting Control Suite -- A Challenging Benchmark for Reinforcement Learning from Pixels

TL;DR

This work introduces the Distracting Control Suite, an extension of the DM Control benchmark that injects three visual distractions—camera pose shifts, object color changes, and background video variations—in static and dynamic modes with tunable difficulty. It systematically evaluates state-of-the-art pixel-based RL methods (SAC and QT-Opt) with RAD and DrQ augmentations, revealing that distractions, especially when combined, significantly degrade performance and that training with distractions yields limited robustness gains. The experiments show that background distractions are particularly challenging and that method rankings shift under distraction, with QT-Opt variants often outperforming SAC in many distracted settings. The authors provide actionable insights and a publicly available benchmark to drive future development of robust vision-based control for real-world robotics.

Abstract

Robots have to face challenging perceptual settings, including changes in viewpoint, lighting, and background. Current simulated reinforcement learning (RL) benchmarks such as DM Control provide visual input without such complexity, which limits the transfer of well-performing methods to the real world. In this paper, we extend DM Control with three kinds of visual distractions (variations in background, color, and camera pose) to produce a new challenging benchmark for vision-based control, and we analyze state of the art RL algorithms in these settings. Our experiments show that current RL methods for vision-based control perform poorly under distractions, and that their performance decreases with increasing distraction complexity, showing that new methods are needed to cope with the visual complexities of the real world. We also find that combinations of multiple distraction types are more difficult than a mere combination of their individual effects.

Paper Structure

This paper contains 12 sections, 4 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: The Distracting Control Suite. The six tasks (one per row) are shown at increasing levels of difficulty (columns). From left to right, camera and color distractors are shown in 0.1 increments from 0 to 1. The number of backgrounds per column is increased from 0 to 1 and then doubles at each column after that up to a maximum of 60. The first column shows the no distractions benchmark. The second column showcases the easy benchmark on one of the 4 available background videos. The third column is our medium benchmark. Current state-of-the-art methods stop learning effective policies at this point.
  • Figure 2: Specification of camera pose range.
  • Figure 3: Blending between the original skybox and the distracting background with $\beta_\text{bg}\in[0,1]$.
  • Figure 4: Evaluating with each distraction type after training without distractions. Distraction intensities $\in [0,1]$ (see Sect. \ref{['sec:suite']}). Lines show means over all 6 tasks. Colors denote methods, solid/dashed lines are results in the static/dynamic setting.
  • Figure 5: Effect of distraction magnitude when distractions are present during training and evaluation. Same legend as above.