Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Irmak Guzey; Ben Evans; Soumith Chintala; Lerrel Pinto

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Irmak Guzey, Ben Evans, Soumith Chintala, Lerrel Pinto

TL;DR

T-Dex addresses the challenge of dexterous manipulation with multi-fingered hands by leveraging self-supervised tactile representations learned from large-scale play data, followed by few-shot, non-parametric imitation that fuses tactile and visual information. The method shows that tactile pretraining substantially boosts performance over vision- or torque-only baselines across five contact-rich tasks, with notable gains as data from diverse play increases. Key contributions include (i) a tactile-focused pretraining pipeline using BYOL on hand-worn sensor data, (ii) a nearest-neighbor imitation framework that combines tactile and visual features, and (iii) extensive ablations highlighting the importance of tactile representations, data, and input architecture. The results demonstrate practical improvements in dexterous manipulation under occlusion and pave the way for data-efficient tactile-vision policies in real-world robots.

Abstract

Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics. Most prominent work in this area focuses on learning controllers or policies that either operate on visual observations or state estimates derived from vision. However, such methods perform poorly on fine-grained manipulation tasks that require reasoning about contact forces or about objects occluded by the hand itself. In this work, we present T-Dex, a new approach for tactile-based dexterity, that operates in two phases. In the first phase, we collect 2.5 hours of play data, which is used to train self-supervised tactile encoders. This is necessary to bring high-dimensional tactile readings to a lower-dimensional embedding. In the second phase, given a handful of demonstrations for a dexterous task, we learn non-parametric policies that combine the tactile observations with visual ones. Across five challenging dexterous tasks, we show that our tactile-based dexterity models outperform purely vision and torque-based models by an average of 1.7X. Finally, we provide a detailed analysis on factors critical to T-Dex including the importance of play data, architectures, and representation learning.

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

TL;DR

Abstract

Paper Structure (40 sections, 21 figures, 3 tables)

This paper contains 40 sections, 21 figures, 3 tables.

Introduction
Related Work
Dexterous Manipulation
Tactile Sensing
Representation Learning for Robotics
Exploratory and Play Data
Offline Imitation Learning
System Details and Robot Setup
Tactile-Based Dexterity (T-Dex)
Phase I: Pre-Training Tactile Representations from Play
Phase II: Non-parametric Learning
Experiments
Description of Dexterous Tasks
Joystick Movement
Bottle Opening
...and 25 more sections

Figures (21)

Figure 1: T-Dex learns dexterous policies from high-dimensional tactile sensors on a multi-fingered robot hand (top). Combined with vision, our tactile representations are crucial to learn fine-grained manipulation tasks (bottom).
Figure 2: Hardware setting of T-Dex. We use an Oculus Headset to teleoperate the Allegro hand and the built in Kinova joystick to control the arm. Visual observations are streamed through two different Realsense cameras and tactile observations are saved with XELA touch sensors on the Allegro hand.
Figure 3: Visualization of some of the play tasks. We play with grasping, pinching, moving objects, and other in-hand manipulation tasks.
Figure 4: An overview of the T-Dex framework. Left: we train tactile representations using BYOL on a large play dataset. Right: we leverage the learned representations using nearest neighbors imitation.
Figure 5: Visualization of robot rollouts from T-Dex policies. Note the severe visual occlusions when the robot makes contact with the object.
...and 16 more figures

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

TL;DR

Abstract

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Authors

TL;DR

Abstract

Table of Contents

Figures (21)