Table of Contents
Fetching ...

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Rafael Rafailov, Kyle Hatch, Anikait Singh, Laura Smith, Aviral Kumar, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip Ball, Jiajun Wu, Chelsea Finn, Sergey Levine

TL;DR

Offline reinforcement learning seeks to learn from fixed datasets without environment interaction during training, but evaluating progress requires realistic benchmarks that reflect real-world tasks. D5RL provides a diverse robotics-focused benchmark with three domains (A1, Franka, WidowX), multiple observation modalities (state and vision), and supports both offline evaluation and online finetuning from offline initialization. Baseline experiments reveal that current offline RL methods struggle on vision-based and multi-stage robotic tasks, with simple BC occasionally outperforming more complex approaches, underscoring the need for better algorithms and representations. By offering rich datasets, varied data sources, and practical evaluation protocols, D5RL aims to accelerate progress in offline RL and offline-to-online learning for real-world robotics.

Abstract

Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate real-world applications, as well as a more standardized approach to RL research. Furthermore, offline RL methods can provide effective initializations for online finetuning to overcome challenges with exploration. However, evaluating progress on offline RL algorithms requires effective and challenging benchmarks that capture properties of real-world tasks, provide a range of task difficulties, and cover a range of challenges both in terms of the parameters of the domain (e.g., length of the horizon, sparsity of rewards) and the parameters of the data (e.g., narrow demonstration data or broad exploratory data). While considerable progress in offline RL in recent years has been enabled by simpler benchmark tasks, the most widely used datasets are increasingly saturating in performance and may fail to reflect properties of realistic tasks. We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments, based on models of real-world robotic systems, and comprising a variety of data sources, including scripted data, play-style data collected by human teleoperators, and other data sources. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation, with some of the tasks specifically designed to require both pre-training and fine-tuning. We hope that our proposed benchmark will facilitate further progress on both offline RL and fine-tuning algorithms. Website with code, examples, tasks, and data is available at \url{https://sites.google.com/view/d5rl/}

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

TL;DR

Offline reinforcement learning seeks to learn from fixed datasets without environment interaction during training, but evaluating progress requires realistic benchmarks that reflect real-world tasks. D5RL provides a diverse robotics-focused benchmark with three domains (A1, Franka, WidowX), multiple observation modalities (state and vision), and supports both offline evaluation and online finetuning from offline initialization. Baseline experiments reveal that current offline RL methods struggle on vision-based and multi-stage robotic tasks, with simple BC occasionally outperforming more complex approaches, underscoring the need for better algorithms and representations. By offering rich datasets, varied data sources, and practical evaluation protocols, D5RL aims to accelerate progress in offline RL and offline-to-online learning for real-world robotics.

Abstract

Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate real-world applications, as well as a more standardized approach to RL research. Furthermore, offline RL methods can provide effective initializations for online finetuning to overcome challenges with exploration. However, evaluating progress on offline RL algorithms requires effective and challenging benchmarks that capture properties of real-world tasks, provide a range of task difficulties, and cover a range of challenges both in terms of the parameters of the domain (e.g., length of the horizon, sparsity of rewards) and the parameters of the data (e.g., narrow demonstration data or broad exploratory data). While considerable progress in offline RL in recent years has been enabled by simpler benchmark tasks, the most widely used datasets are increasingly saturating in performance and may fail to reflect properties of realistic tasks. We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments, based on models of real-world robotic systems, and comprising a variety of data sources, including scripted data, play-style data collected by human teleoperators, and other data sources. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation, with some of the tasks specifically designed to require both pre-training and fine-tuning. We hope that our proposed benchmark will facilitate further progress on both offline RL and fine-tuning algorithms. Website with code, examples, tasks, and data is available at \url{https://sites.google.com/view/d5rl/}
Paper Structure (25 sections, 1 equation, 6 figures, 4 tables)

This paper contains 25 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: A visualization of the environments in our proposed benchmark. We provide datasets for training locomotion policies for the A1 robot (left), learning manipulation in randomized vision-based kitchen-like environments with a Franka robotic arm (middle), and learning multi-stage pick-and-place tasks with a WidowX low-cost robotic manipulator (right). Each domain is accompanied by several datasets with different properties and evaluates a distinct aspect of offline RL and offline training with online finetuning.
  • Figure 2: Hiking task. The A1 robot at the start of the course in front of a randomized terrain.
  • Figure 3: Observations for the Standard Franka Kitchen tasks consist of two $64\times 64$ RGB images from an a top-down and a wrist camera, as well as robot proprioception.
  • Figure 4: Observations from the Randomized Kitchen environment consist of two $128\times 128$ RGB images from side-cameras, $128\times128$ RGB image from a wrist camera, and robot proprioception. The environment includes several different types of kettles and microwaves, which require different grasps. Moreover, their locations are randomized across the scene. Textures, lighting conditions, and camera angles are also varied across episodes.
  • Figure 5: Setup for the Multi-Stage Manipulation with Scripted Data tasks consist of a simulated WidowX arm with 2 identical bins. In the center of the scene are two objects that are categorized as shoes or toys which the agent has to sort into their respective bins.
  • ...and 1 more figures