Table of Contents
Fetching ...

Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning

Marcel Hussing, Jorge A. Mendez, Anisha Singrodia, Cassandra Kent, Eric Eaton

TL;DR

This work addresses the challenge of learning robust robotic manipulation policies from offline data by introducing large-scale, compositional RL datasets derived from CompoSuite, totaling $256$ million transitions across $256$ tasks per dataset. It evaluates baseline offline RL methods and compositional variants, finding that while compositional architectures can improve training performance, they still fail to generalize to unseen tasks, underscoring the need for algorithms that explicitly capture modular structure. The contributions include four datasets with diverse data quality (Expert, Medium, Warmstart, Medium-Replay), detailed training/test splits, and multiple training regimes to probe compositional generalization, highlighting a clear gap between current offline RL capabilities and the potential of compositional representations. The work has practical impact by enabling systematic study of offline compositional RL for robotics and guiding future research toward modular, transferable RL policies, with implications for pre-training on large robotic datasets and subsequent zero-shot or few-shot adaptation.

Abstract

Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1)~it permits creating many tasks from few components, 2)~the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3)~the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the $256$ tasks from CompoSuite [Mendez at al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of $256$ million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments show that current offline RL methods can learn the training tasks to some extent and that compositional methods outperform non-compositional methods. Yet current methods are unable to extract the compositional structure to generalize to unseen tasks, highlighting a need for future research in offline compositional RL.

Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning

TL;DR

This work addresses the challenge of learning robust robotic manipulation policies from offline data by introducing large-scale, compositional RL datasets derived from CompoSuite, totaling million transitions across tasks per dataset. It evaluates baseline offline RL methods and compositional variants, finding that while compositional architectures can improve training performance, they still fail to generalize to unseen tasks, underscoring the need for algorithms that explicitly capture modular structure. The contributions include four datasets with diverse data quality (Expert, Medium, Warmstart, Medium-Replay), detailed training/test splits, and multiple training regimes to probe compositional generalization, highlighting a clear gap between current offline RL capabilities and the potential of compositional representations. The work has practical impact by enabling systematic study of offline compositional RL for robotics and guiding future research toward modular, transferable RL policies, with implications for pre-training on large robotic datasets and subsequent zero-shot or few-shot adaptation.

Abstract

Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1)~it permits creating many tasks from few components, 2)~the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3)~the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the tasks from CompoSuite [Mendez at al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments show that current offline RL methods can learn the training tasks to some extent and that compositional methods outperform non-compositional methods. Yet current methods are unable to extract the compositional structure to generalize to unseen tasks, highlighting a need for future research in offline compositional RL.
Paper Structure (16 sections, 2 figures, 4 tables)

This paper contains 16 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Examples of four CompoSuite tasks, showing each task's initial state. Each task is composed of one element from each of four compositional axes, involving a robot (IIWA, Jaco, Gen3, or Panda) manipulating an object (box, hollow_box, plate, or dumbbell) while avoiding an obstacle (no_obstacle, object_door, goal_wall, object_wall) to achieve a specific objective (pick_and_place, push, trash_can, or shelf). Images from mendez2022composuite.
  • Figure 2: An overview of our dataset creation and training process. Manipulation tasks vary along four compositional dimensions, as taken from CompoSuite. Trajectories are sampled from pre-trained PPO agents, forming four different datasets of varying difficulties (Section \ref{['sec:datasetdesc']}). Three different training settings (Section \ref{['sec:trainlists']}) provide different views into these data for training, and evaluation of the learned policies is performed on the CompoSuite simulator using mujoco.