FastUMI: A Scalable and Hardware-Independent Universal Manipulation Interface with Dataset
Zhaxizhuoma, Kehui Liu, Chuyue Guan, Zhongjie Jia, Ziniu Wu, Xin Liu, Tianyu Wang, Shuai Liang, Pengan Chen, Pingrui Zhang, Haoming Song, Delin Qu, Dong Wang, Zhigang Wang, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li
TL;DR
FastUMI addresses the scarcity and cost of real-world robotic manipulation data by delivering a hardware-decoupled, plug-and-play data collection system that couples handheld demonstrations with robot-mounted execution. It replaces the original VIO pipeline with RealSense T265 tracking, scaffolds a robust software pipeline for data collection and verification, and opens a 10k+ demonstration dataset spanning 22 tasks to accelerate imitation-learning progress. The authors introduce algorithmic adaptations—Smooth-ACT and PoseACT for first-person perspectives, and Depth-Enhanced DP—along with a dynamic error-compensation mechanism to maintain alignment across diverse hardware. The result is a scalable, cost-effective platform that sustains robust performance across varied manipulation scenarios, demonstrated by significant improvements in depth-sensitive tasks and broad cross-platform transfer potential. The open dataset and modular framework are positioned to advance data-driven robotic learning across real-world, diverse environments.
Abstract
Real-world manipulation data involving robotic arms is crucial for developing generalist action policies, yet such data remains scarce since existing data collection methods are hindered by high costs, hardware dependencies, and complex setup requirements. In this work, we introduce FastUMI, a substantial redesign of the Universal Manipulation Interface (UMI) system that addresses these challenges by enabling rapid deployment, simplifying hardware-software integration, and delivering robust performance in real-world data acquisition. Compared with UMI, FastUMI has several advantages: 1) It adopts a decoupled hardware design and incorporates extensive mechanical modifications, removing dependencies on specialized robotic components while preserving consistent observation perspectives. 2) It also refines the algorithmic pipeline by replacing complex Visual-Inertial Odometry (VIO) implementations with an off-the-shelf tracking module, significantly reducing deployment complexity while maintaining accuracy. 3) FastUMI includes an ecosystem for data collection, verification, and integration with both established and newly developed imitation learning algorithms, accelerating policy learning advancement. Additionally, we have open-sourced a high-quality dataset of over 10,000 real-world demonstration trajectories spanning 22 everyday tasks, forming one of the most diverse UMI-like datasets to date. Experimental results confirm that FastUMI facilitates rapid deployment, reduces operational costs and labor demands, and maintains robust performance across diverse manipulation scenarios, thereby advancing scalable data-driven robotic learning.
