Table of Contents
Fetching ...

TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment

Youngsun Wi, Jessica Yin, Elvis Xiang, Akash Sharma, Jitendra Malik, Mustafa Mukadam, Nima Fazeli, Tess Hellebrekers

TL;DR

This work proposes TactAlign, a cross-embodiment tactile alignment method that transfers human-collected tactile signals to a robot with different embodiment using a rectified flow, and enables zero-shot H2R transfer on a highly dexterous tasks (light bulb screwing).

Abstract

Human demonstrations collected by wearable devices (e.g., tactile gloves) provide fast and dexterous supervision for policy learning, and are guided by rich, natural tactile feedback. However, a key challenge is how to transfer human-collected tactile signals to robots despite the differences in sensing modalities and embodiment. Existing human-to-robot (H2R) approaches that incorporate touch often assume identical tactile sensors, require paired data, and involve little to no embodiment gap between human demonstrator and the robots, limiting scalability and generality. We propose TactAlign, a cross-embodiment tactile alignment method that transfers human-collected tactile signals to a robot with different embodiment. TactAlign transforms human and robot tactile observations into a shared latent representation using a rectified flow, without paired datasets, manual labels, or privileged information. Our method enables low-cost latent transport guided by hand-object interaction-derived pseudo-pairs. We demonstrate that TactAlign improves H2R policy transfer across multiple contact-rich tasks (pivoting, insertion, lid closing), generalizes to unseen objects and tasks with human data (less than 5 minutes), and enables zero-shot H2R transfer on a highly dexterous tasks (light bulb screwing).

TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment

TL;DR

This work proposes TactAlign, a cross-embodiment tactile alignment method that transfers human-collected tactile signals to a robot with different embodiment using a rectified flow, and enables zero-shot H2R transfer on a highly dexterous tasks (light bulb screwing).

Abstract

Human demonstrations collected by wearable devices (e.g., tactile gloves) provide fast and dexterous supervision for policy learning, and are guided by rich, natural tactile feedback. However, a key challenge is how to transfer human-collected tactile signals to robots despite the differences in sensing modalities and embodiment. Existing human-to-robot (H2R) approaches that incorporate touch often assume identical tactile sensors, require paired data, and involve little to no embodiment gap between human demonstrator and the robots, limiting scalability and generality. We propose TactAlign, a cross-embodiment tactile alignment method that transfers human-collected tactile signals to a robot with different embodiment. TactAlign transforms human and robot tactile observations into a shared latent representation using a rectified flow, without paired datasets, manual labels, or privileged information. Our method enables low-cost latent transport guided by hand-object interaction-derived pseudo-pairs. We demonstrate that TactAlign improves H2R policy transfer across multiple contact-rich tasks (pivoting, insertion, lid closing), generalizes to unseen objects and tasks with human data (less than 5 minutes), and enables zero-shot H2R transfer on a highly dexterous tasks (light bulb screwing).
Paper Structure (56 sections, 17 equations, 16 figures, 5 tables)

This paper contains 56 sections, 17 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: We propose TactAlign, a cross-sensor tactile alignment method for cross-embodiment human-to-robot policy transfer. Given unpaired human (tactile glove) and robot demonstrations, TactAlign uses a rectified flow to map glove tactile features into the robot tactile space. This alignment enables effective tactile policy co-training on pivoting, insertion, and lid closing tasks. With only a few minutes of human demonstrations, the resulting policies generalize to unseen objects instances. Importantly, the same learned alignment can be reused to train policies on unseen tasks. We also demonstrate zero-shot human-to-robot dexterous manipulation on a light bulb screwing task.
  • Figure 2: Tactile Alignment Overview. Our method consists of two stages: self-supervised representation learning and cross-embodiment alignment via pseudo-pairs. We use a learnable length-1 query between the encoder and decoder to produce a fixed-dimensional latent representation via cross-attention pooling. A learnable length 1 query is implemented between the encoder and decoder to output a fixed-dimensional latent representations after the cross-attention module. In step2, we aggregate the learned latents from both domains to construct pseudo-pairs $(h^*, r^*)$, and learn a velocity field $v_\theta$ that transports the glove latent distribution to the robot latent distribution.
  • Figure 3: Red and blue indicate two subsets of the source distribution; training uses the provided pairs (lines), with colors preserved at $\alpha = 0.2$ for the target samples associated with each pair (left of each panel) and their transformed targets (right of each panel). First: Standard rectified flow liuflow learns a low-cost transport between two distributions by training on randomly. Second: We propose using pseudo-pairs to the rectified flow for guiding the velocity field toward desired correspondences between the source and target distributions. Third: Despite noise in the pseudo-pairs, the learned rectified flow remains robust and converges to an efficient transport map between the two distributions.
  • Figure 4: H2R Action Policy. Given either human or robot inputs, the shared policy follows a color-coded structure, representing robot, human, and shared modules. Human glove latent features are passed into an ODE solver via a learned velocity field. The proprioceptive encoder takes fingertip locations in yellow dots and wrist orientation. Only the yellow modules are trained; all others are frozen.
  • Figure 5: Tactile Features UMAP Projections.First: Rectified flow maps the glove latent distribution to overlap with the robot distribution. Second & Third: Colors denote normalized raw tactile magnitude (0: no contact, 1: highest force/shear), computed separately for glove and robot data. As indicated by the arrows, the alignment exhibits a consistent cross-domain trend in contact force magnitudes, even though force is not used during training.
  • ...and 11 more figures