ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation
Shiqi Yang, Minghuan Liu, Yuzhe Qin, Runyu Ding, Jialong Li, Xuxin Cheng, Ruihan Yang, Sha Yi, Xiaolong Wang
TL;DR
ACE addresses the need for low-cost, cross-platform dexterous teleoperation to collect broad demonstrations for learning-based manipulation. It combines a hand-facing camera for 3D hand pose estimation with dual exoskeleton bases and uses forward kinematics for wrist tracking and inverse-kinematics based retargeting to map operator motion to various robot morphologies, using the explicit mapping $ \mathbf{x}_e = \gamma (\mathbf{x}_h - \mathbf{c}_h) + \mathbf{c}_t $ and the constrained optimization $ \min_{q_t} \sum_{i=0}^N \left| \alpha v_{it} - f_i(q_t) \right|^2 + \beta \left| q_t - q_{t-1} \right|^2 $ subject to $ q_l \le q_t \le q_u $. Empirically, ACE achieves ~1 mm FK accuracy, ~3 mm target-reaching errors across workspace scales, and supports imitation learning pipelines across multiple robot platforms, underscoring its potential for scalable dexterous manipulation research.
Abstract
Learning from demonstrations has shown to be an effective approach to robotic manipulation, especially with the recently collected large-scale robot data with teleoperation systems. Building an efficient teleoperation system across diverse robot platforms has become more crucial than ever. However, there is a notable lack of cost-effective and user-friendly teleoperation systems for different end-effectors, e.g., anthropomorphic robot hands and grippers, that can operate across multiple platforms. To address this issue, we develop ACE, a cross-platform visual-exoskeleton system for low-cost dexterous teleoperation. Our system utilizes a hand-facing camera to capture 3D hand poses and an exoskeleton mounted on a portable base, enabling accurate real-time capture of both finger and wrist poses. Compared to previous systems, which often require hardware customization according to different robots, our single system can generalize to humanoid hands, arm-hands, arm-gripper, and quadruped-gripper systems with high-precision teleoperation. This enables imitation learning for complex manipulation tasks on diverse platforms.
