Robo-DM: Data Management For Large Robot Datasets
Kaiyuan Chen, Letian Fu, David Huang, Yanxiang Zhang, Lawrence Yunliang Chen, Huang Huang, Kush Hari, Ashwin Balakrishna, Ted Xiao, Pannag R Sanketi, John Kubiatowicz, Ken Goldberg
TL;DR
Robo-DM introduces a unified EBML-based container for multi-modal robot data (vision, language, action) to address the storage, transmission, and loading bottlenecks of large teleoperated datasets. The framework combines self-contained data storage, flexible lossy and lossless video compression, memory-mapped caching, and load-balancing to deliver dramatic data-size reductions (up to ~70x lossy, ~3.5x lossless) and faster loading compared to prior formats. Empirical evaluations on Open-X-Embodiment show substantial throughput improvements and minimal degradation in downstream tasks, with case studies including fine-tuning Octo and In-Context Robot Transformer training demonstrating practical utility. Overall, Robo-DM enables scalable, cost-efficient training of large robotic models by streamlining data collection, management, and integration with existing ML pipelines and ROS2 tooling.
Abstract
Recent results suggest that very large datasets of teleoperated robot demonstrations can be used to train transformer-based models that have the potential to generalize to new scenes, robots, and tasks. However, curating, distributing, and loading large datasets of robot trajectories, which typically consist of video, textual, and numerical modalities - including streams from multiple cameras - remains challenging. We propose Robo-DM, an efficient open-source cloud-based data management toolkit for collecting, sharing, and learning with robot data. With Robo-DM, robot datasets are stored in a self-contained format with Extensible Binary Meta Language (EBML). Robo-DM can significantly reduce the size of robot trajectory data, transfer costs, and data load time during training. Compared to the RLDS format used in OXE datasets, Robo-DM's compression saves space by up to 70x (lossy) and 3.5x (lossless). Robo-DM also accelerates data retrieval by load-balancing video decoding with memory-mapped decoding caches. Compared to LeRobot, a framework that also uses lossy video compression, Robo-DM is up to 50x faster when decoding sequentially. We physically evaluate a model trained by Robo-DM with lossy compression, a pick-and-place task, and In-Context Robot Transformer. Robo-DM uses 75x compression of the original dataset and does not suffer reduction in downstream task accuracy.
