Table of Contents
Fetching ...

Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds

Shoujie Li, Haixin Yu, Wenbo Ding, Houde Liu, Linqi Ye, Chongkun Xia, Xueqian Wang, Xiao-Ping Zhang

TL;DR

This work tackles robust transparent-object grasping under complex backgrounds and lighting by integrating vision and touch through a TaTa gripper. It introduces SimTrans12K, a Gaussian-Mask annotation scheme, the TGCNN grasp-detection network, a tactile feature extractor, and a visual-tactile classifier, enhanced by THS and TPE modules for challenging scenes. The approach achieves substantial gains in grasping success (≈36.7%) and classification accuracy (≈39.1%), validated across plane, irregular, and underwater scenarios, including stacking and fragmentation. This framework enhances perception in low-visibility environments and demonstrates practical potential for robust translucent-object manipulation in real-world robotics.

Abstract

The accurate detection and grasping of transparent objects are challenging but of significance to robots. Here, a visual-tactile fusion framework for transparent object grasping under complex backgrounds and variant light conditions is proposed, including the grasping position detection, tactile calibration, and visual-tactile fusion based classification. First, a multi-scene synthetic grasping dataset generation method with a Gaussian distribution based data annotation is proposed. Besides, a novel grasping network named TGCNN is proposed for grasping position detection, showing good results in both synthetic and real scenes. In tactile calibration, inspired by human grasping, a fully convolutional network based tactile feature extraction method and a central location based adaptive grasping strategy are designed, improving the success rate by 36.7% compared to direct grasping. Furthermore, a visual-tactile fusion method is proposed for transparent objects classification, which improves the classification accuracy by 34%. The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.

Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds

TL;DR

This work tackles robust transparent-object grasping under complex backgrounds and lighting by integrating vision and touch through a TaTa gripper. It introduces SimTrans12K, a Gaussian-Mask annotation scheme, the TGCNN grasp-detection network, a tactile feature extractor, and a visual-tactile classifier, enhanced by THS and TPE modules for challenging scenes. The approach achieves substantial gains in grasping success (≈36.7%) and classification accuracy (≈39.1%), validated across plane, irregular, and underwater scenarios, including stacking and fragmentation. This framework enhances perception in low-visibility environments and demonstrates practical potential for robust translucent-object manipulation in real-world robotics.

Abstract

The accurate detection and grasping of transparent objects are challenging but of significance to robots. Here, a visual-tactile fusion framework for transparent object grasping under complex backgrounds and variant light conditions is proposed, including the grasping position detection, tactile calibration, and visual-tactile fusion based classification. First, a multi-scene synthetic grasping dataset generation method with a Gaussian distribution based data annotation is proposed. Besides, a novel grasping network named TGCNN is proposed for grasping position detection, showing good results in both synthetic and real scenes. In tactile calibration, inspired by human grasping, a fully convolutional network based tactile feature extraction method and a central location based adaptive grasping strategy are designed, improving the success rate by 36.7% compared to direct grasping. Furthermore, a visual-tactile fusion method is proposed for transparent objects classification, which improves the classification accuracy by 34%. The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.
Paper Structure (25 sections, 4 equations, 27 figures, 7 tables)

This paper contains 25 sections, 4 equations, 27 figures, 7 tables.

Figures (27)

  • Figure 1: The visual-tactile fusion framework inspired by human grasping.
  • Figure 2: Examples of transparent object dataset. (A) ClearGrasp sajjan2020clear. (B) Dex-NeRF ichnowski2021dex. (C) LIT zhou2020lit. (D) Light Field Camera used in LIT dataset.
  • Figure 3: Detection with RGB and depth cameras. Left: undulating scenes: RGB (A) and Depth (B) images; Right: underwater scenes: RGB (A) and Depth (B) images.
  • Figure 4: Hardware system. (A) The structure of TaTa: (a) The schematic diagram of TaTa, (b) The layout of the inside LEDs, (c) The illustration of the inside light path. (B) Coordinate system (CS). (C) Visual-tactile fusion grasping experimental platform. (D) Tactile perception effect test: (a) Screwdriver picture, (b) Perception result. (E) Grasping performance testing: (a) Grasp an egg, (b) Grasp a tomato.
  • Figure 5: The visual-tactile fusion framework for transparent object grasping. (A) Grasping position detection. (B) Tactile information extraction. (C) Visual-tactile fusion classification.
  • ...and 22 more figures