TransDex: Pre-training Visuo-Tactile Policy with Point Cloud Reconstruction for Dexterous Manipulation of Transparent Objects

Fengguan Li; Yifan Ma; Chen Qian; Wentao Rao; Weiwei Shang

TransDex: Pre-training Visuo-Tactile Policy with Point Cloud Reconstruction for Dexterous Manipulation of Transparent Objects

Fengguan Li, Yifan Ma, Chen Qian, Wentao Rao, Weiwei Shang

Abstract

Dexterous manipulation enables complex tasks but suffers from self-occlusion, severe depth noise, and depth information loss when manipulating transparent objects. To solve this problem, this paper proposes TransDex, a 3D visuo-tactile fusion motor policy based on point cloud reconstruction pre-training. Specifically, we first propose a self-supervised point cloud reconstruction pre-training approach based on Transformer. This method accurately recovers the 3D structure of objects from interactive point clouds of dexterous hands, even when random noise and large-scale masking are added. Building on this, TransDex is constructed in which perceptual encoding adopts a fine-grained hierarchical scheme and multi-round attention mechanisms adaptively fuse features of the robotic arm and dexterous hand to enable differentiated motion prediction. Results from transparent object manipulation experiments conducted on a real robotic system demonstrate that TransDex outperforms existing baseline methods. Further analysis validates the generalization capabilities of TransDex and the effectiveness of its individual components.

TransDex: Pre-training Visuo-Tactile Policy with Point Cloud Reconstruction for Dexterous Manipulation of Transparent Objects

Abstract

Paper Structure (32 sections, 11 equations, 5 figures, 3 tables)

This paper contains 32 sections, 11 equations, 5 figures, 3 tables.

Introduction
Related Work
Visuo-Tactile Fusion
Pre-training for Robotics
Robotic Manipulation for Transparent Objects
Method
Pre-training Based on Point Cloud Reconstruction
Pre-training Dataset
Grouping and Random Masking
Point Cloud Feature Extraction
Query Point Generation and Decoder
Visuo-Tactile Fusion Motor Policy
Point Cloud Processing
Perceptual Encoding
Modality Fusion Module
...and 17 more sections

Figures (5)

Figure 1: The Overall Framework of the Pre-Training-Based Visual-Tactile Fusion Motor Policy, TransDex. Perceptual information enters the encoder through a fine-grained, hierarchical manner. The hand-object interaction point cloud utilizes a pre-trained encoder, followed by an attention fusion module and two policy heads to achieve feature integration and differentiated action prediction.
Figure 2: Pre-Training Framework. First, noise is added to and masked from the original hand-object interaction point cloud to generate input data. Then, features are extracted using a pre-encoder and a Transformer encoder. Finally, the generated query points and a Transformer decoder are employed to accomplish the point cloud reconstruction task for the object.
Figure 3: Robotic System Setup:1 a 16-DOF dexterous hand, with array tactile sensors equipped on the fingertips and finger pads; 2 a 7-DOF robotic arm; 3 depth cameras; 4 experimental items; 5 a data glove; 6 a motion capture camera.
Figure 4: Visualization of Policy’s rollout on Three Transparent Object Manipulation Tasks, including pouring, shaking, and rotating. The unseen objects used in the test, as well as the complex backgrounds and lighting conditions, are shown at the bottom.
Figure 5: Visualization of Point Cloud Reconstruction Performance in Pre-Training Tasks and Real-World Transfer Reconstruction Outcomes.

TransDex: Pre-training Visuo-Tactile Policy with Point Cloud Reconstruction for Dexterous Manipulation of Transparent Objects

Abstract

TransDex: Pre-training Visuo-Tactile Policy with Point Cloud Reconstruction for Dexterous Manipulation of Transparent Objects

Authors

Abstract

Table of Contents

Figures (5)