Table of Contents
Fetching ...

Visuotactile-Based Learning for Insertion with Compliant Hands

Osher Azulay, Dhruv Metha Ramesh, Nimrod Curtis, Avishai Sintov

TL;DR

This letter proposes a simulation-based multimodal policy learning framework that leverages all-around tactile sensing and an extrinsic depth camera, and emphasizes the crucial role of tactile sensing in conjunction with visual perception for accurate object-socket pose estimation, successful sim-to-real transfer and robust task execution.

Abstract

Compared to rigid hands, underactuated compliant hands offer greater adaptability to object shapes, provide stable grasps, and are often more cost-effective. However, they introduce uncertainties in hand-object interactions due to their inherent compliance and lack of precise finger proprioception as in rigid hands. These limitations become particularly significant when performing contact-rich tasks like insertion. To address these challenges, additional sensing modalities are required to enable robust insertion capabilities. This letter explores the essential sensing requirements for successful insertion tasks with compliant hands, focusing on the role of visuotactile perception (i.e., visual and tactile perception). We propose a simulation-based multimodal policy learning framework that leverages all-around tactile sensing and an extrinsic depth camera. A transformer-based policy, trained through a teacher-student distillation process, is successfully transferred to a real-world robotic system without further training. Our results emphasize the crucial role of tactile sensing in conjunction with visual perception for accurate object-socket pose estimation, successful sim-to-real transfer and robust task execution.

Visuotactile-Based Learning for Insertion with Compliant Hands

TL;DR

This letter proposes a simulation-based multimodal policy learning framework that leverages all-around tactile sensing and an extrinsic depth camera, and emphasizes the crucial role of tactile sensing in conjunction with visual perception for accurate object-socket pose estimation, successful sim-to-real transfer and robust task execution.

Abstract

Compared to rigid hands, underactuated compliant hands offer greater adaptability to object shapes, provide stable grasps, and are often more cost-effective. However, they introduce uncertainties in hand-object interactions due to their inherent compliance and lack of precise finger proprioception as in rigid hands. These limitations become particularly significant when performing contact-rich tasks like insertion. To address these challenges, additional sensing modalities are required to enable robust insertion capabilities. This letter explores the essential sensing requirements for successful insertion tasks with compliant hands, focusing on the role of visuotactile perception (i.e., visual and tactile perception). We propose a simulation-based multimodal policy learning framework that leverages all-around tactile sensing and an extrinsic depth camera. A transformer-based policy, trained through a teacher-student distillation process, is successfully transferred to a real-world robotic system without further training. Our results emphasize the crucial role of tactile sensing in conjunction with visual perception for accurate object-socket pose estimation, successful sim-to-real transfer and robust task execution.

Paper Structure

This paper contains 17 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: Tight insertion of an object into a socket with a robotic arm and a three-finger compliant hand without hand proprioception. Two sensing modalities are used: vision provides a rough estimate of the object-socket poses and tactile sensors on the fingers deliver implicit contact information. A policy is initially trained in simulation and subsequently deployed in zero-shot on the real system.
  • Figure 2: Overview illustration of the training steps in simulation. First, a teacher policy is trained using privileged information. Subsequently, a distillation process is employed to train a student policy that learns to imitate the teacher's behavior, relying solely on visuotactile data and End-Effector (EE) pose.
  • Figure 3: Tactile images from (left) simulated and (right) real compliant hand with an AllSight sensor grasping similar objects.
  • Figure 4: A schematic illustrating the deployment of the student policy, trained in simulation, onto the real-world robot. The policy receives observation and visuotactile sensory data and generates control signals to actuate the robot arm.
  • Figure 5: Ablation study of insertion success rate in simulation for policies trained with different sensory modalities, as a function of the number of training steps.
  • ...and 2 more figures