Table of Contents
Fetching ...

MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation

Kelin Yu, Yunhai Han, Qixian Wang, Vaibhav Saxena, Danfei Xu, Ye Zhao

TL;DR

MimicTouch tackles contact-rich manipulation by learning tactile-guided policies from demonstrations performed with human hands, eliminating the sensing-collection gap created by vision-driven teleoperation. It combines four components: collecting multi-modal tactile demonstrations, self-supervised tactile-audio representation learning, non-parametric imitation to derive an offline policy, and online residual RL to bridge the human-robot embodiment gap. Empirical results show superior data collection throughput, better offline policy performance than teleoperation-based baselines, and dramatic online improvements with strong zero-shot generalization across diverse insertion and assembly tasks. The approach enables efficient, tactile-centric policy learning with practical implications for real-world manipulation under occlusion and cluttered environments.

Abstract

Tactile sensing is critical to fine-grained, contact-rich manipulation tasks, such as insertion and assembly. Prior research has shown the possibility of learning tactile-guided policy from teleoperated demonstration data. However, to provide the demonstration, human users often rely on visual feedback to control the robot. This creates a gap between the sensing modality used for controlling the robot (visual) and the modality of interest (tactile). To bridge this gap, we introduce "MimicTouch", a novel framework for learning policies directly from demonstrations provided by human users with their hands. The key innovations are i) a human tactile data collection system which collects multi-modal tactile dataset for learning human's tactile-guided control strategy, ii) an imitation learning-based framework for learning human's tactile-guided control strategy through such data, and iii) an online residual RL framework to bridge the embodiment gap between the human hand and the robot gripper. Through comprehensive experiments, we highlight the efficacy of utilizing human's tactile-guided control strategy to resolve contact-rich manipulation tasks. The project website is at https://sites.google.com/view/MimicTouch.

MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation

TL;DR

MimicTouch tackles contact-rich manipulation by learning tactile-guided policies from demonstrations performed with human hands, eliminating the sensing-collection gap created by vision-driven teleoperation. It combines four components: collecting multi-modal tactile demonstrations, self-supervised tactile-audio representation learning, non-parametric imitation to derive an offline policy, and online residual RL to bridge the human-robot embodiment gap. Empirical results show superior data collection throughput, better offline policy performance than teleoperation-based baselines, and dramatic online improvements with strong zero-shot generalization across diverse insertion and assembly tasks. The approach enables efficient, tactile-centric policy learning with practical implications for real-world manipulation under occlusion and cluttered environments.

Abstract

Tactile sensing is critical to fine-grained, contact-rich manipulation tasks, such as insertion and assembly. Prior research has shown the possibility of learning tactile-guided policy from teleoperated demonstration data. However, to provide the demonstration, human users often rely on visual feedback to control the robot. This creates a gap between the sensing modality used for controlling the robot (visual) and the modality of interest (tactile). To bridge this gap, we introduce "MimicTouch", a novel framework for learning policies directly from demonstrations provided by human users with their hands. The key innovations are i) a human tactile data collection system which collects multi-modal tactile dataset for learning human's tactile-guided control strategy, ii) an imitation learning-based framework for learning human's tactile-guided control strategy through such data, and iii) an online residual RL framework to bridge the embodiment gap between the human hand and the robot gripper. Through comprehensive experiments, we highlight the efficacy of utilizing human's tactile-guided control strategy to resolve contact-rich manipulation tasks. The project website is at https://sites.google.com/view/MimicTouch.
Paper Structure (30 sections, 8 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 8 equations, 13 figures, 3 tables, 1 algorithm.

Figures (13)

  • Figure 1: The first row shows the human tactile demonstrations, including the tactile and proprioception data. The second row shows the robot execution with tactile feedback. The third row below the dashed line describes the policy's zero-shot generalization capability in five different domains, including variations in hole positions, angles, inner shapes, materials, and a different assembly task.
  • Figure 2: Illustration of the MimicTouch Framework. In part (a), we collect the multi-modal human tactile demonstrations. In part (b), we learn compact low-dimensional tactile representations. In part (c), we derive an offline policy through a non-parametric imitation learning method. In part (d), we refine the offline policy through online residual reinforcement learning on a physical robot.
  • Figure 3: Top: Qualitative results for Spacemouse-based teleoperation, Hand-guided teleoperation, and Human Tactile Demonstration policies. Bottom: Visualization of the action serial numbers for three successful rollout trajectories generated by each policy. Solid red lines indicate mean trends and shaded areas show $\pm$ standard deviations. The left side of the dashed orange line is the Reach phase, and the right side is the Insertion phase.
  • Figure 4: Left: Demonstrations of the online RL fine-tuning process, which further improves the task performance. Right: Quantitative task evaluations for offline policies learned from teleoperation demonstrations (SpaceMouse and HandGuided) and human tactile demonstrations (Human Tactile) during online RL fine-tuning show that Human Tactile significantly outperforms others in terms of task success rate and RL training efficiency.
  • Figure 5: Setup of zero-shot generalization tasks.
  • ...and 8 more figures