Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

Joey Hejna; Chethan Bhateja; Yichen Jiang; Karl Pertsch; Dorsa Sadigh

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

Joey Hejna, Chethan Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh

TL;DR

Re-Mix introduces a distributionally robust optimization framework to automatically curate multi-domain robotics data by learning domain weights that maximize worst-case downstream improvement potential. The method addresses robotics-specific challenges—unbalanced domain losses, continuous action spaces, and overfitting—via per-domain action normalization, action discretization, and early stopping of the reference model. Through extensive experiments on Bridge V2 and the OpenX RT-X data, Re-Mix-based data mixtures outperform uniform and human-curated mixes, and enable meaningful data subsetting with minimal performance loss. The approach provides a practical, scalable path to improve generalist robot policies by principled data selection rather than manual curation. It also highlights design choices—such as discretization and reference-model training—that critically impact gains from dataset curation in robotics.

Abstract

Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that data selection has been of utmost importance in vision and natural language processing, little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or ``domains'' of robotics datasets for robot foundation model pre-training. Concrete, we use distributionally robust optimization (DRO) to maximize worst-case performance across all possible downstream domains. Our method, Re-Mix, addresses the wide range of challenges that arise when applying DRO to robotics datasets including variability in action spaces and dynamics across different datasets. Re-Mix employs early stopping, action normalization, and discretization to counteract these issues. Through extensive experimentation on the largest open-source robot manipulation dataset, the Open X-Embodiment dataset, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by Re-Mix outperform uniform weights by 38\% on average and outperform human-selected weights by 32\% on datasets used to train existing generalist robot policies, specifically the RT-X models.

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

TL;DR

Abstract

Paper Structure (33 sections, 3 equations, 7 figures, 6 tables)

This paper contains 33 sections, 3 equations, 7 figures, 6 tables.

Introduction
Related Work
Re-weighing Robotic Dataset Mixtures with Minimax Optimization
Problem Setup.
Distributionally Robust Optimization.
The Challenges of Applying Robust Optimization in Robotics
Unbalanced Losses.
Continuous Losses.
Overfitting.
Re-weighing Robotic Dataset Mixtures with Minimax Optimization
Experiments
Experimental Setup
Datasets.
Training and Evaluation Details.
Comparisons.
...and 18 more sections

Figures (7)

Figure 1: Results for curating the RT-X training mix. We test policies trained on different weightings of the data mixture used by RT-X across two WidowX (left) and two Franka (right) tabletop manipulation tasks. We find that the policy trained on the data mix curated with Re-Mix achieves strongest performance, even outperforming the human-expert-curated data mix from RT-X openx. Mean $\pm$ StdErr across 4 tasks, 10 evaluations each.
Figure 2: On Bridge V2 bridge there is no notable difference between uniform sampling vs. Re-Mix when training on the full dataset.
Figure 3: Results sub-setting datasets via different strategies until they reach 25% of their original size. We again use 10 evaluations per task, and show the Mean $\pm$ StdErr.
Figure 4: Ablations for design choices in Re-Mix. We ablate the effects of left: reference model overfitting by selecting a checkpoint once validation loss starts increasing at 150K steps and right: using continuous actions for Re-Mix. For ablations, we remove the "Flip Bowl" and 'Cube to Plate" tasks as all Re-Mix variants achieved 100% success.
Figure 5: Bridge 10% subsetting.
...and 2 more figures

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

TL;DR

Abstract

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)