EMMA: Scaling Mobile Manipulation via Egocentric Human Data
Lawrence Y. Zhu, Pranav Kuppili, Ryan Punamiya, Patcharapong Aphiwetsa, Dhruv Patel, Simar Kareer, Sehoon Ha, Danfei Xu
TL;DR
EMMA tackles the data bottleneck in mobile manipulation by learning from egocentric human demonstrations complemented with static robot data, bypassing mobile teleoperation. It introduces a data retargeting step, a unified decoder-transformer architecture for cross-embodiment co-training, and an unsupervised phase- identification module to switch between navigation and manipulation. Across four real-world tasks, EMMA matches or surpasses teleoperation-based baselines, generalizes to unseen environments, and shows favorable scaling with more human data. This work suggests a scalable data paradigm for mobile manipulation in real-world environments.
Abstract
Scaling mobile manipulation imitation learning is bottlenecked by expensive mobile robot teleoperation. We present Egocentric Mobile MAnipulation (EMMA), an end-to-end framework training mobile manipulation policies from human mobile manipulation data with static robot data, sidestepping mobile teleoperation. To accomplish this, we co-train human full-body motion data with static robot data. In our experiments across three real-world tasks, EMMA demonstrates comparable performance to baselines trained on teleoperated mobile robot data (Mobile ALOHA), achieving higher or equivalent task performance in full task success. We find that EMMA is able to generalize to new spatial configurations and scenes, and we observe positive performance scaling as we increase the hours of human data, opening new avenues for scalable robotic learning in real-world environments. Details of this project can be found at https://ego-moma.github.io/.
