PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation

Rosy Chen; Mustafa Mukadam; Michael Kaess; Tingfan Wu; Francois R Hogan; Jitendra Malik; Akash Sharma

PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation

Rosy Chen, Mustafa Mukadam, Michael Kaess, Tingfan Wu, Francois R Hogan, Jitendra Malik, Akash Sharma

TL;DR

PTLD is introduced, a novel approach to learning tactile manipulation skills without requiring tactile simulation that leverages privileged sensors in the real world to collect real-world tactile policy data and distill a robust state estimator that operates on tactile input.

Abstract

Tactile dexterous manipulation is essential to automating complex household tasks, yet learning effective control policies remains a challenge. While recent work has relied on imitation learning, obtaining high quality demonstrations for multi-fingered hands via robot teleoperation or kinesthetic teaching is prohibitive. Alternatively, with reinforcement we can learn skills in simulation, but fast and realistic simulation of tactile observations is challenging. To bridge this gap, we introduce PTLD: sim-to-real Privileged Tactile Latent Distillation, a novel approach to learning tactile manipulation skills without requiring tactile simulation. Instead of simulating tactile sensors or relying purely on proprioceptive policies to transfer zero-shot sim-to-real, our key idea is to leverage privileged sensors in the real world to collect real-world tactile policy data. This data is then used to distill a robust state estimator that operates on tactile input. We demonstrate from our experiments that PTLD can be used to improve proprioceptive manipulation policies trained in simulation significantly by incorporating tactile sensing. On the benchmark in-hand rotation task, PTLD achieves a 182% improvement over a proprioception only policy. We also show that PTLD enables learning the challenging task of tactile in-hand reorientation where we see a 57% improvement in the number of goals reached over using proprioception alone. Website: https://akashsharma02.github.io/ptld-website/.

PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation

TL;DR

Abstract

Paper Structure (37 sections, 4 equations, 13 figures, 5 tables)

This paper contains 37 sections, 4 equations, 13 figures, 5 tables.

Introduction
Related work
Dexterous In-hand Manipulation
Privileged distillation
Tactile sensing and Representation learning
Background
Notation
Privileged latent distillation
Asymmetric Actor Critic
PTLD: Privileged Tactile Latent Distillation
Online distillation with Asymmetric Actor Critic
Privileged sensors for Sim-to-Real Tactile Distillation
The privileged tactile manipulation system
Real world privileged sensor cell
Robot Setup
...and 22 more sections

Figures (13)

Figure 1: PTLD: sim-to-real Privileged Tactile Latent Distillation is an approach to learn tactile dexterous policies without simulating tactile sensors. First, Privileged sensor policies are trained in simulation using reinforcement learning which produces strong policies. These policies are deployed in instrumented real-world setups to collect tactile demonstrations. Finally, a tactile state estimator is trained from tactile demonstrations to obtain robust real-world deployable tactile policies. With PTLD, we demonstrate that in-hand rotation is robust to object property changes such as slip, mass, and wrist orientation changes, and that performance for the challenging task of in-hand reorientation improves significantly by over 57% with tactile sensing, when compared to proprioception only policy.
Figure 2: (left)Privileged latent distillation is a two stage approach to training policies in simulation. An oracle policy with privileged information is trained in stage 1, then it is distilled into a deployable policy in stage 2 (in simulation). (right) Asymmetric Actor Critic is a single stage approach where two networks actor and critic respectively are trained simultaneously. The critic is provided with privileged information and learns the value function, while the actor is only given deployable partial sensor information
Figure 3: A simplified illustration of PTLD. Once we have a privileged sensor policy trained in simulation using AAC, first we collect demonstrations in the real world by deploying the policy, and additionally collect deployment sensor observations. Then, we train a deployment encoder (tactile encoder in this case) to recover the latents from the privileged sensor policy using an offline dataset.
Figure 4: Visualization of tactile observations and the latents changing over the first 1 second of privileged sensor policy deployment. Here we visualize only the tactile data at the robot fingertip for simplicity, however the tactile encoder takes as input all observations from the hand.
Figure 5: Tactile encoders for manipulation tasks: a) For in-hand rotation, we concatenate a history of tactile signals and sensor positions, and encode them using a 1D temporal convolution network to predict the tactile latents. b) For in-hand reorientation, we concatenate, tactile signals, proprioception, goal orientations and past latents and embed them with a causal transformer to produce future tactile latents.
...and 8 more figures

PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation

TL;DR

Abstract

PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (13)