Table of Contents
Fetching ...

Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

Kelin Li, Shubham M Wagh, Nitish Sharma, Saksham Bhadani, Wei Chen, Chang Liu, Petar Kormushev

TL;DR

This work tackles data-efficient imitation learning for robotic manipulation by introducing an immersive VR teleoperation platform paired with a haptic-enabled transformer-based framework, Haptic-ACT. The VR setup enables remote, dexterous demonstrations using a SenseGlove for tactile feedback, while latency-resilient control and a digital twin IK pipeline ensure stable real-world execution. Haptic-ACT extends the ACT approach by incorporating 5 fingertip forces via a CVAE-driven style variable and transformer-based chunking, trained with a combined MSE and KL objective. Across MuJoCo simulation and real-robot experiments, Haptic-ACT yields softer, more human-like grasps with about 15–25% reductions in fingertip forces and comparable task success, highlighting improved handling of delicate/deformable objects and the practical impact of tactile feedback in learning from demonstration.

Abstract

Robotic manipulation is essential for the widespread adoption of robots in industrial and home settings and has long been a focus within the robotics community. Advances in artificial intelligence have introduced promising learning-based methods to address this challenge, with imitation learning emerging as particularly effective. However, efficiently acquiring high-quality demonstrations remains a challenge. In this work, we introduce an immersive VR-based teleoperation setup designed to collect demonstrations from a remote human user. We also propose an imitation learning framework called Haptic Action Chunking with Transformers (Haptic-ACT). To evaluate the platform, we conducted a pick-and-place task and collected 50 demonstration episodes. Results indicate that the immersive VR platform significantly reduces demonstrator fingertip forces compared to systems without haptic feedback, enabling more delicate manipulation. Additionally, evaluations of the Haptic-ACT framework in both the MuJoCo simulator and on a real robot demonstrate its effectiveness in teaching robots more compliant manipulation compared to the original ACT. Additional materials are available at https://sites.google.com/view/hapticact.

Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

TL;DR

This work tackles data-efficient imitation learning for robotic manipulation by introducing an immersive VR teleoperation platform paired with a haptic-enabled transformer-based framework, Haptic-ACT. The VR setup enables remote, dexterous demonstrations using a SenseGlove for tactile feedback, while latency-resilient control and a digital twin IK pipeline ensure stable real-world execution. Haptic-ACT extends the ACT approach by incorporating 5 fingertip forces via a CVAE-driven style variable and transformer-based chunking, trained with a combined MSE and KL objective. Across MuJoCo simulation and real-robot experiments, Haptic-ACT yields softer, more human-like grasps with about 15–25% reductions in fingertip forces and comparable task success, highlighting improved handling of delicate/deformable objects and the practical impact of tactile feedback in learning from demonstration.

Abstract

Robotic manipulation is essential for the widespread adoption of robots in industrial and home settings and has long been a focus within the robotics community. Advances in artificial intelligence have introduced promising learning-based methods to address this challenge, with imitation learning emerging as particularly effective. However, efficiently acquiring high-quality demonstrations remains a challenge. In this work, we introduce an immersive VR-based teleoperation setup designed to collect demonstrations from a remote human user. We also propose an imitation learning framework called Haptic Action Chunking with Transformers (Haptic-ACT). To evaluate the platform, we conducted a pick-and-place task and collected 50 demonstration episodes. Results indicate that the immersive VR platform significantly reduces demonstrator fingertip forces compared to systems without haptic feedback, enabling more delicate manipulation. Additionally, evaluations of the Haptic-ACT framework in both the MuJoCo simulator and on a real robot demonstrate its effectiveness in teaching robots more compliant manipulation compared to the original ACT. Additional materials are available at https://sites.google.com/view/hapticact.
Paper Structure (18 sections, 5 equations, 7 figures, 2 tables)

This paper contains 18 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Summary diagram of the proposed immersive VR-based setup used in this work, featuring a VR headset, a haptic feedback glove, a follower robot arm, and a robot hand. (a) illustrates the robot arm and hand system following human demonstrations and providing sensory feedback, (b) depicts the demonstrator remotely controlling the robot, and (c) displays the VR view from the headset.
  • Figure 2: Flowchart of the proposed Haptic-ACT. The observations include RGB images from two cameras, the robot's joint positions, and the fingertip forces of the hand. Note that the transformer encoder (CVAE encoder) operates only during the training phase to compute the style variable for the transformer encoder (CVAE decoder). During the inference phase, the style variable is fixed at 0.
  • Figure 3: Communication and feedback within the immersive VR-based teleoperation system involve capturing the user's hand position and orientation with the Meta Quest 3. A digital twin is employed to calculate inverse kinematics for the real robot arm. Finger joint positions are captured using a SenseGlove and mapped to desired positions for the real robot hand. All commands are published through ROS, and motor values from the real robot hand are translated into fingertip forces, which are then applied to the user via the SenseGlove.
  • Figure 4: Average fingertip force during manipulation. (a) Displays the results from the MuJoCo simulator. (b) Presents the results from the real-world experiment, where Demo_SG indicates demonstrations with SenseGlove, and Demo_w/ o_SG refers to demonstrations without SenseGlove.
  • Figure 5: Comparison of average fingertip force among different demonstrator groups during a real-world pick-and-place task. The results of Student’s $t$-test are indicated: ***$p < 0.001$, **$p < 0.01$, *$p<0.05$.
  • ...and 2 more figures