Table of Contents
Fetching ...

Learning Fine Pinch-Grasp Skills using Tactile Sensing from A Few Real-world Demonstrations

Xiaofeng Mao, Yucheng Xu, Ruoshi Wen, Mohammadreza Kasaei, Wanming Yu, Efi Psomopoulou, Nathan F. Lepora, Zhibin Li

TL;DR

The paper tackles data-efficient imitation learning for dexterous, dual-arm pinch grasping by leveraging rich tactile sensing from TacTip sensors. It introduces a convolutional autoencoder to extract compact tactile features and fuses these with proprioceptive information in a behavior cloning framework to learn contact-aware sensorimotor policies from a few real demonstrations. Key contributions include demonstrating robust generalization to unseen objects, resilience to external disturbances, and re-grasping capabilities, as well as interpretability via saliency analysis. The results show high success rates and clear advantages of tactile-enabled fusion over baselines, highlighting practical potential for contact-rich manipulation without heavy visual cues.

Abstract

Imitation learning for robot dexterous manipulation, especially with a real robot setup, typically requires a large number of demonstrations. In this paper, we present a data-efficient learning from demonstration framework which exploits the use of rich tactile sensing data and achieves fine bimanual pinch grasping. Specifically, we employ a convolutional autoencoder network that can effectively extract and encode high-dimensional tactile information. Further, We develop a framework that achieves efficient multi-sensor fusion for imitation learning, allowing the robot to learn contact-aware sensorimotor skills from demonstrations. Our comparision study against the framework without using encoded tactile features highlighted the effectiveness of incorporating rich contact information, which enabled dexterous bimanual grasping with active contact searching. Extensive experiments demonstrated the robustness of the fine pinch grasp policy directly learned from few-shot demonstration, including grasping of the same object with different initial poses, generalizing to ten unseen new objects, robust and firm grasping against external pushes, as well as contact-aware and reactive re-grasping in case of dropping objects under very large perturbations. Furthermore, the saliency map analysis method is used to describe weight distribution across various modalities during pinch grasping, confirming the effectiveness of our framework at leveraging multimodal information.

Learning Fine Pinch-Grasp Skills using Tactile Sensing from A Few Real-world Demonstrations

TL;DR

The paper tackles data-efficient imitation learning for dexterous, dual-arm pinch grasping by leveraging rich tactile sensing from TacTip sensors. It introduces a convolutional autoencoder to extract compact tactile features and fuses these with proprioceptive information in a behavior cloning framework to learn contact-aware sensorimotor policies from a few real demonstrations. Key contributions include demonstrating robust generalization to unseen objects, resilience to external disturbances, and re-grasping capabilities, as well as interpretability via saliency analysis. The results show high success rates and clear advantages of tactile-enabled fusion over baselines, highlighting practical potential for contact-rich manipulation without heavy visual cues.

Abstract

Imitation learning for robot dexterous manipulation, especially with a real robot setup, typically requires a large number of demonstrations. In this paper, we present a data-efficient learning from demonstration framework which exploits the use of rich tactile sensing data and achieves fine bimanual pinch grasping. Specifically, we employ a convolutional autoencoder network that can effectively extract and encode high-dimensional tactile information. Further, We develop a framework that achieves efficient multi-sensor fusion for imitation learning, allowing the robot to learn contact-aware sensorimotor skills from demonstrations. Our comparision study against the framework without using encoded tactile features highlighted the effectiveness of incorporating rich contact information, which enabled dexterous bimanual grasping with active contact searching. Extensive experiments demonstrated the robustness of the fine pinch grasp policy directly learned from few-shot demonstration, including grasping of the same object with different initial poses, generalizing to ten unseen new objects, robust and firm grasping against external pushes, as well as contact-aware and reactive re-grasping in case of dropping objects under very large perturbations. Furthermore, the saliency map analysis method is used to describe weight distribution across various modalities during pinch grasping, confirming the effectiveness of our framework at leveraging multimodal information.
Paper Structure (20 sections, 4 equations, 5 figures)

This paper contains 20 sections, 4 equations, 5 figures.

Figures (5)

  • Figure 1: Autonomous dexterous grasping with soft tactile sensors, including pre-grasp, press, roll-lift, and firm grasp.
  • Figure 2: Architecture detailing the teleoperation system for demonstrations and the LfD framework.
  • Figure 3: Generalization of the learned policy and its robustness to external disturbance.
  • Figure 4: Results of the comparison study. The policy trained with both comparison frameworks bypass the object directly, manoeuvring the end-effectors directly to the desired end-poses without making any physical contact, grasping attempts, or interactions with the tube.
  • Figure 5: The output of the learned policy and the weight changes during the grasping.