Table of Contents
Fetching ...

ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation

Shiqi Yang, Minghuan Liu, Yuzhe Qin, Runyu Ding, Jialong Li, Xuxin Cheng, Ruihan Yang, Sha Yi, Xiaolong Wang

TL;DR

ACE addresses the need for low-cost, cross-platform dexterous teleoperation to collect broad demonstrations for learning-based manipulation. It combines a hand-facing camera for 3D hand pose estimation with dual exoskeleton bases and uses forward kinematics for wrist tracking and inverse-kinematics based retargeting to map operator motion to various robot morphologies, using the explicit mapping $ \mathbf{x}_e = \gamma (\mathbf{x}_h - \mathbf{c}_h) + \mathbf{c}_t $ and the constrained optimization $ \min_{q_t} \sum_{i=0}^N \left| \alpha v_{it} - f_i(q_t) \right|^2 + \beta \left| q_t - q_{t-1} \right|^2 $ subject to $ q_l \le q_t \le q_u $. Empirically, ACE achieves ~1 mm FK accuracy, ~3 mm target-reaching errors across workspace scales, and supports imitation learning pipelines across multiple robot platforms, underscoring its potential for scalable dexterous manipulation research.

Abstract

Learning from demonstrations has shown to be an effective approach to robotic manipulation, especially with the recently collected large-scale robot data with teleoperation systems. Building an efficient teleoperation system across diverse robot platforms has become more crucial than ever. However, there is a notable lack of cost-effective and user-friendly teleoperation systems for different end-effectors, e.g., anthropomorphic robot hands and grippers, that can operate across multiple platforms. To address this issue, we develop ACE, a cross-platform visual-exoskeleton system for low-cost dexterous teleoperation. Our system utilizes a hand-facing camera to capture 3D hand poses and an exoskeleton mounted on a portable base, enabling accurate real-time capture of both finger and wrist poses. Compared to previous systems, which often require hardware customization according to different robots, our single system can generalize to humanoid hands, arm-hands, arm-gripper, and quadruped-gripper systems with high-precision teleoperation. This enables imitation learning for complex manipulation tasks on diverse platforms.

ACE: A Cross-Platform Visual-Exoskeletons System for Low-Cost Dexterous Teleoperation

TL;DR

ACE addresses the need for low-cost, cross-platform dexterous teleoperation to collect broad demonstrations for learning-based manipulation. It combines a hand-facing camera for 3D hand pose estimation with dual exoskeleton bases and uses forward kinematics for wrist tracking and inverse-kinematics based retargeting to map operator motion to various robot morphologies, using the explicit mapping and the constrained optimization subject to . Empirically, ACE achieves ~1 mm FK accuracy, ~3 mm target-reaching errors across workspace scales, and supports imitation learning pipelines across multiple robot platforms, underscoring its potential for scalable dexterous manipulation research.

Abstract

Learning from demonstrations has shown to be an effective approach to robotic manipulation, especially with the recently collected large-scale robot data with teleoperation systems. Building an efficient teleoperation system across diverse robot platforms has become more crucial than ever. However, there is a notable lack of cost-effective and user-friendly teleoperation systems for different end-effectors, e.g., anthropomorphic robot hands and grippers, that can operate across multiple platforms. To address this issue, we develop ACE, a cross-platform visual-exoskeleton system for low-cost dexterous teleoperation. Our system utilizes a hand-facing camera to capture 3D hand poses and an exoskeleton mounted on a portable base, enabling accurate real-time capture of both finger and wrist poses. Compared to previous systems, which often require hardware customization according to different robots, our single system can generalize to humanoid hands, arm-hands, arm-gripper, and quadruped-gripper systems with high-precision teleoperation. This enables imitation learning for complex manipulation tasks on diverse platforms.
Paper Structure (15 sections, 4 equations, 8 figures, 4 tables, 3 algorithms)

This paper contains 15 sections, 4 equations, 8 figures, 4 tables, 3 algorithms.

Figures (8)

  • Figure 1: An Overview of the Proposed ACE System. The system consists of two bimanual exoskeleton arms and two cameras for hand pose tracking. Together with our modular design of the base, we can perform teleoperation across a wide range of end effectors and robot platforms.
  • Figure 2: Architecture of the ACE Teleoperation System. Our system reads the joint angles from our exoskeleton motors and the hand image to estimate the wrist and hand poses through forward kinematics and a hand detection algorithm. With different modes of operation, we can perform teleoperation on different end-effectors and robot platforms.
  • Figure 3: Hardware Components. Left: assembled exoskeleton on a fixed desktop base. Right: parts for one arm. We show two wrist connectors and links of different sizes.
  • Figure 4: Details of Cross-Platform Teleoperation. This figure showcases various control modes for efficiently teleoperating different robots, including normal/mirror modes and hands/gripper configurations. Additionally, it illustrates the application of our Dual-base setup in various scenarios. In the Dual-Arm+Hands setup, normal mode with a desktop base is used to control the xArm with the ability hand. The Humanoid+Hands setup employs mirror mode with a desktop base to control the robot. In the Quadrapeds+Gripper setup, the right hand controls the robot in normal mode, while the left hand uses simple poses to manage the quadruped’s movement using a pre-trained low-level control policy at a fixed velocity. The mobile base allows the operator to follow the robot’s movements, enabling better control.
  • Figure 5: Examples of Cross-Platform Teleoperation. Examples 1-3 are performed on the xArm with ability hand setup: 1) stacking, 2) serving coffee, and 3) soldering. Examples 4-7 are executed on the H1 with inspire hand setup: 4) spraying, 5) passing, 6) pipetting, and 7) inserting tennis. Example 8 is on the GR-1 with a gripper, performing a box-packing task. Example 9 features the B1 with Z1 setup, demonstrating a shopping cart-pushing task. Example 10 is on the Franka with a gripper, performing a task of picking up miscellaneous objects.
  • ...and 3 more figures