DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation
Mengda Xu, Han Zhang, Yifan Hou, Zhenjia Xu, Linxi Fan, Manuela Veloso, Shuran Song
TL;DR
DexUMI tackles the embodiment gap in transferring dexterous manipulation skills from humans to diverse robot hands by coupling a wearable hand exoskeleton with a software pipeline that replaces human hands in demonstrations with robot hands. The hardware adaptation aligns human fingertip motion to the target robot hand while preserving wearability and capturing accurate joint angles and tactile data; the software adaptation renders robot-hand visuals to produce training data with consistent observations. Real-world experiments on Inspire and XHand across four tasks show 86% average success and a 3.2× increase in data-collection efficiency over teleoperation, with analyses highlighting the benefits of relative finger actions and tactile input under different conditions. Overall, DexUMI presents a scalable approach for efficient, cross-hardware dexterous policy learning using human-in-the-loop demonstrations aligned with robot capabilities.
Abstract
We present DexUMI - a data collection and policy learning framework that uses the human hand as the natural interface to transfer dexterous manipulation skills to various robot hands. DexUMI includes hardware and software adaptations to minimize the embodiment gap between the human hand and various robot hands. The hardware adaptation bridges the kinematics gap using a wearable hand exoskeleton. It allows direct haptic feedback in manipulation data collection and adapts human motion to feasible robot hand motion. The software adaptation bridges the visual gap by replacing the human hand in video data with high-fidelity robot hand inpainting. We demonstrate DexUMI's capabilities through comprehensive real-world experiments on two different dexterous robot hand hardware platforms, achieving an average task success rate of 86%.
