Table of Contents
Fetching ...

DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors

Joseph Ortiz, Antoine Dedieu, Wolfgang Lehrach, Swaroop Guntupalli, Carter Wendelken, Ahmad Humayun, Guangyao Zhou, Sivaramakrishnan Swaminathan, Miguel Lázaro-Gredilla, Kevin Murphy

TL;DR

It is found that pretrained representations do not help policy learning on DMC-VB, and a large representation gap between policies learned on pixel observations and on states is highlighted, which demonstrates when expert data is limited, policy learning can benefit from representations pretrained on suboptimal data and tasks with stochastic hidden goals.

Abstract

Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents by avoiding the need for expensive online learning. Despite strong generalization in some respects, agents are often remarkably brittle to minor visual variations in control-irrelevant factors such as the background or camera viewpoint. In this paper, we present theDeepMind Control Visual Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents for solving continuous control tasks from visual input in the presence of visual distractors. In contrast to prior works, our dataset (a) combines locomotion and navigation tasks of varying difficulties, (b) includes static and dynamic visual variations, (c) considers data generated by policies with different skill levels, (d) systematically returns pairs of state and pixel observation, (e) is an order of magnitude larger, and (f) includes tasks with hidden goals. Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods. First, we find that pretrained representations do not help policy learning on DMC-VB, and we highlight a large representation gap between policies learned on pixel observations and on states. Second, we demonstrate when expert data is limited, policy learning can benefit from representations pretrained on (a) suboptimal data, and (b) tasks with stochastic hidden goals. Our dataset and benchmark code to train and evaluate agents are available at: https://github.com/google-deepmind/dmc_vision_benchmark.

DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors

TL;DR

It is found that pretrained representations do not help policy learning on DMC-VB, and a large representation gap between policies learned on pixel observations and on states is highlighted, which demonstrates when expert data is limited, policy learning can benefit from representations pretrained on suboptimal data and tasks with stochastic hidden goals.

Abstract

Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents by avoiding the need for expensive online learning. Despite strong generalization in some respects, agents are often remarkably brittle to minor visual variations in control-irrelevant factors such as the background or camera viewpoint. In this paper, we present theDeepMind Control Visual Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents for solving continuous control tasks from visual input in the presence of visual distractors. In contrast to prior works, our dataset (a) combines locomotion and navigation tasks of varying difficulties, (b) includes static and dynamic visual variations, (c) considers data generated by policies with different skill levels, (d) systematically returns pairs of state and pixel observation, (e) is an order of magnitude larger, and (f) includes tasks with hidden goals. Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods. First, we find that pretrained representations do not help policy learning on DMC-VB, and we highlight a large representation gap between policies learned on pixel observations and on states. Second, we demonstrate when expert data is limited, policy learning can benefit from representations pretrained on (a) suboptimal data, and (b) tasks with stochastic hidden goals. Our dataset and benchmark code to train and evaluate agents are available at: https://github.com/google-deepmind/dmc_vision_benchmark.
Paper Structure (43 sections, 7 equations, 19 figures, 4 tables)

This paper contains 43 sections, 7 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: DeepMind Control Vision Benchmark.
  • Figure 2: Reward distribution for different behavioral policy levels in DMC-VB. Note the log scale on the vertical axis. Statistics of these distributions are summarized in Appendix \ref{['sec:appendix_data']}.
  • Figure 3: Online evaluation scores on the locomotion tasks [three top rows] and ant maze navigation tasks [bottom row] of DMC-VB, averaged over $30$ trajectories, with standard errors. Higher reward is better. For locomotion task rows, results are grouped by distractor type and demonstration data quality. For the ant maze row, results are grouped by maze difficulty and demonstration data quality. NULL + BC is the best overall method. Pretrained representations offer no advantage with or without visual distractors. LFD + BC performs poorly, AE + BC and DINO + BC learn moderate policies, and ID + BC is comparable to NULL + BC. Full scores are in Appendix \ref{['sec:appendix_b1_results']} and the temporal evolution of rewards through training is plotted in Appendix \ref{['sec:appendix_time_series']}.
  • Figure 4: [Left] Least-squared test error for reconstructing the observations [top] and states [bottom], averaged over the different DMC-VB locomotion tasks and policies. Lower is better. As results are averaged over $150\text{k}$ samples, the standard errors are too small to be visible. See Appendix \ref{['sec:appendix_b1_state_obs_rec']}, for a detailed breakdown per task and distractor. [Right] Observation reconstruction examples. See Appendix \ref{['sec:appendix_b1_obs_examples']}, for additional image reconstructions. BC and ID both (a) discard visual distractors (background object and agent color), and (b) reach the lowest state reconstruction errors.
  • Figure 5: Pretraining encoders on mixed data improves performance when a BC policy is trained on a small expert dataset. BC and ID pretraining perform similarly. For each task, performance is reported as the proportion of reward obtained by BC on full expert data without distractors (higher is better). We include full results including cheetah, and LFD/AE pretraining in Appendix \ref{['sec:benchmark2_locomotion_appendix']}.
  • ...and 14 more figures