Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Joey Hejna, Chethan Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh
TL;DR
Re-Mix introduces a distributionally robust optimization framework to automatically curate multi-domain robotics data by learning domain weights that maximize worst-case downstream improvement potential. The method addresses robotics-specific challenges—unbalanced domain losses, continuous action spaces, and overfitting—via per-domain action normalization, action discretization, and early stopping of the reference model. Through extensive experiments on Bridge V2 and the OpenX RT-X data, Re-Mix-based data mixtures outperform uniform and human-curated mixes, and enable meaningful data subsetting with minimal performance loss. The approach provides a practical, scalable path to improve generalist robot policies by principled data selection rather than manual curation. It also highlights design choices—such as discretization and reference-model training—that critically impact gains from dataset curation in robotics.
Abstract
Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that data selection has been of utmost importance in vision and natural language processing, little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or ``domains'' of robotics datasets for robot foundation model pre-training. Concrete, we use distributionally robust optimization (DRO) to maximize worst-case performance across all possible downstream domains. Our method, Re-Mix, addresses the wide range of challenges that arise when applying DRO to robotics datasets including variability in action spaces and dynamics across different datasets. Re-Mix employs early stopping, action normalization, and discretization to counteract these issues. Through extensive experimentation on the largest open-source robot manipulation dataset, the Open X-Embodiment dataset, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by Re-Mix outperform uniform weights by 38\% on average and outperform human-selected weights by 32\% on datasets used to train existing generalist robot policies, specifically the RT-X models.
