UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers
Huy Ha, Yihuai Gao, Zipeng Fu, Jie Tan, Shuran Song
TL;DR
UMI-on-Legs presents a scalable framework for mobile manipulation on quadrupeds by pairing real-world demonstrations collected with a hand-held gripper and simulation-trained whole-body controllers that track end-effector trajectories in a task frame. The key idea is an embodiment-agnostic interface: a diffusion-based manipulation policy proposes end-effector targets, which a high-frequency WBC executes on the robot to realize the task. Across dynamic tossing, pushing, and cross-embodiment cup rearrangement, the approach achieves robust performance (over 70% success in real and simulated tasks) and demonstrates zero-shot transfer of a policy trained for a fixed-base arm to a legged platform. The work highlights a practical path to port expressive manipulation skills to mobile, dynamic robots by decoupling task-space planning from embodiment-specific control and using lightweight, real-world data collection with accessible sensing.
Abstract
We introduce UMI-on-Legs, a new framework that combines real-world and simulation data for quadruped manipulation systems. We scale task-centric data collection in the real world using a hand-held gripper (UMI), providing a cheap way to demonstrate task-relevant manipulation skills without a robot. Simultaneously, we scale robot-centric data in simulation by training whole-body controller for task-tracking without task simulation setups. The interface between these two policies is end-effector trajectories in the task frame, inferred by the manipulation policy and passed to the whole-body controller for tracking. We evaluate UMI-on-Legs on prehensile, non-prehensile, and dynamic manipulation tasks, and report over 70% success rate on all tasks. Lastly, we demonstrate the zero-shot cross-embodiment deployment of a pre-trained manipulation policy checkpoint from prior work, originally intended for a fixed-base robot arm, on our quadruped system. We believe this framework provides a scalable path towards learning expressive manipulation skills on dynamic robot embodiments. Please checkout our website for robot videos, code, and data: https://umi-on-legs.github.io
