Table of Contents
Fetching ...

Tilde: Teleoperation for Dexterous In-Hand Manipulation Learning with a DeltaHand

Zilin Si, Kevin Lee Zhang, Zeynep Temel, Oliver Kroemer

TL;DR

Tilde tackles the challenge of dexterous in-hand manipulation by fusing a low-cost DeltaHand with a kinematic twin teleoperation interface (TeleHand) and diffusion-policy imitation learning. The system enables high-quality human demonstrations and end-to-end real-world policy learning, achieving an average success rate of $90\%$ across seven manipulation tasks. Key contributions include the DeltaHand redesign for higher force and precision, the TeleHand interface for precise one-to-one joint control, and diffusion-policy learning with DAgger and data augmentation for robustness. The approach demonstrates practical data efficiency and potential for generalization in unstructured environments, offering a scalable platform for advancing dexterous in-hand manipulation research.

Abstract

Dexterous robotic manipulation remains a challenging domain due to its strict demands for precision and robustness on both hardware and software. While dexterous robotic hands have demonstrated remarkable capabilities in complex tasks, efficiently learning adaptive control policies for hands still presents a significant hurdle given the high dimensionalities of hands and tasks. To bridge this gap, we propose Tilde, an imitation learning-based in-hand manipulation system on a dexterous DeltaHand. It leverages 1) a low-cost, configurable, simple-to-control, soft dexterous robotic hand, DeltaHand, 2) a user-friendly, precise, real-time teleoperation interface, TeleHand, and 3) an efficient and generalizable imitation learning approach with diffusion policies. Our proposed TeleHand has a kinematic twin design to the DeltaHand that enables precise one-to-one joint control of the DeltaHand during teleoperation. This facilitates efficient high-quality data collection of human demonstrations in the real world. To evaluate the effectiveness of our system, we demonstrate the fully autonomous closed-loop deployment of diffusion policies learned from demonstrations across seven dexterous manipulation tasks with an average 90% success rate.

Tilde: Teleoperation for Dexterous In-Hand Manipulation Learning with a DeltaHand

TL;DR

Tilde tackles the challenge of dexterous in-hand manipulation by fusing a low-cost DeltaHand with a kinematic twin teleoperation interface (TeleHand) and diffusion-policy imitation learning. The system enables high-quality human demonstrations and end-to-end real-world policy learning, achieving an average success rate of across seven manipulation tasks. Key contributions include the DeltaHand redesign for higher force and precision, the TeleHand interface for precise one-to-one joint control, and diffusion-policy learning with DAgger and data augmentation for robustness. The approach demonstrates practical data efficiency and potential for generalization in unstructured environments, offering a scalable platform for advancing dexterous in-hand manipulation research.

Abstract

Dexterous robotic manipulation remains a challenging domain due to its strict demands for precision and robustness on both hardware and software. While dexterous robotic hands have demonstrated remarkable capabilities in complex tasks, efficiently learning adaptive control policies for hands still presents a significant hurdle given the high dimensionalities of hands and tasks. To bridge this gap, we propose Tilde, an imitation learning-based in-hand manipulation system on a dexterous DeltaHand. It leverages 1) a low-cost, configurable, simple-to-control, soft dexterous robotic hand, DeltaHand, 2) a user-friendly, precise, real-time teleoperation interface, TeleHand, and 3) an efficient and generalizable imitation learning approach with diffusion policies. Our proposed TeleHand has a kinematic twin design to the DeltaHand that enables precise one-to-one joint control of the DeltaHand during teleoperation. This facilitates efficient high-quality data collection of human demonstrations in the real world. To evaluate the effectiveness of our system, we demonstrate the fully autonomous closed-loop deployment of diffusion policies learned from demonstrations across seven dexterous manipulation tasks with an average 90% success rate.
Paper Structure (40 sections, 15 figures, 6 tables)

This paper contains 40 sections, 15 figures, 6 tables.

Figures (15)

  • Figure 1: $\widetilde{\mathit{Tilde}}$: $\underline{T}$eleoperation for Dexterous $\underline{I}$n-Hand Manipulation $\underline{L}$earning with a $\underline{De}$ltaHand. We introduce an imitation learning-based in-hand manipulation system with a dexterous DeltaHand. We present a kinematic twin teleoperation interface, TeleHand, to collect demonstrations on seven dexterous manipulation tasks, such as shape insertion shown above. By using vision-conditioned diffusion policies, the DeltaHand can autonomously complete the tasks.
  • Figure 2: (a) A DeltaHand with an in-hand RGB camera. A kinematic twin teleoperation interface including (b) a DeltaHand and (e) a TeleHand. The TeleHand uses linear sliders with potentiometers to record the joint states of each finger. The DeltaHand will reproduce the motions of a TeleHand by using the Telehand's potentiometer readings as desired joint positions for its linear actuators. (c) The DeltaHand's fingers have 3D-printed rigid-core embedded links and edged joints, which increase the stiffness of each finger and enable them to exert more force. (d) The TeleHand's fingers have 3D-printed soft links and curved joints, which induce more compliance in each finger. Therefore less force is required for users to teleoperate the robot, which makes teleoperation easier. (f)-(i) In-hand camera images that capture the object and the DeltaHand's fingers. (j) The TeleHand's joint states indicate the movement of each finger during a demonstration.
  • Figure 3: Experimental setup. We mount a DeltaHand on a Franka robot arm. We pre-set the height and location of the Franka arm on top of the experiment workspace. An external RGB camera is mounted in front of the experiment workspace.
  • Figure 4: Task gallery. We evaluate our system on seven dexterous manipulation tasks: (a) Grasp (b) Block Slide (c) Block Lift (d) Ball Roll (e) Cap Twist (f) Syringe Push (g) Shape Insert. The goals of tasks are indicated by blue arrows in the initial images of task trajectories. For tasks (a)- (d), we separate the training and additional unseen testing objects with white dashed lines.
  • Figure 5: Qualitative comparisons between task executions from policies trained before and after DAgger demonstrations. By refining the policies with corrective demonstrations from failure cases, the policies can handle these challenging scenarios.
  • ...and 10 more figures