Table of Contents
Fetching ...

Cross-Sensor Touch Generation

Samanta Rodriguez, Yiming Dou, Miquel Oller, Andrew Owens, Nima Fazeli

TL;DR

This work tackles the challenge of diverse visuo-tactile sensors by proposing cross-sensor tactile generation methods that enable models trained on one sensor to operate on others. It introduces two pipelines: a one-stage, paired-data diffusion approach called Touch-to-Touch (T2T) and a two-stage, depth-mediated approach called Touch-to-Depth-to-Touch (T2D2) that works with unpaired data. The authors validate these methods on in-hand pose estimation and behavior cloning tasks, demonstrating successful transfer across Soft Bubble, GelSlim, and DIGIT sensors, with T2T delivering higher fidelity and T2D2 offering greater data-efficiency for adding new sensors. The results reveal a trade-off between fidelity and data flexibility, highlighting the potential of sensor-interoperable tactile systems for reusable downstream perception and control pipelines.

Abstract

Today's visuo-tactile sensors come in many shapes and sizes, making it challenging to develop general-purpose tactile representations. This is because most models are tied to a specific sensor design. To address this challenge, we propose two approaches to cross-sensor image generation. The first is an end-to-end method that leverages paired data (Touch2Touch). The second method builds an intermediate depth representation and does not require paired data (T2D2: Touch-to-Depth-to-Touch). Both methods enable the use of sensor-specific models across multiple sensors via the cross-sensor touch generation process. Together, these models offer flexible solutions for sensor translation, depending on data availability and application needs. We demonstrate their effectiveness on downstream tasks such as in-hand pose estimation and behavior cloning, successfully transferring models trained on one sensor to another. Project page: https://samantabelen.github.io/cross_sensor_touch_generation.

Cross-Sensor Touch Generation

TL;DR

This work tackles the challenge of diverse visuo-tactile sensors by proposing cross-sensor tactile generation methods that enable models trained on one sensor to operate on others. It introduces two pipelines: a one-stage, paired-data diffusion approach called Touch-to-Touch (T2T) and a two-stage, depth-mediated approach called Touch-to-Depth-to-Touch (T2D2) that works with unpaired data. The authors validate these methods on in-hand pose estimation and behavior cloning tasks, demonstrating successful transfer across Soft Bubble, GelSlim, and DIGIT sensors, with T2T delivering higher fidelity and T2D2 offering greater data-efficiency for adding new sensors. The results reveal a trade-off between fidelity and data flexibility, highlighting the potential of sensor-interoperable tactile systems for reusable downstream perception and control pipelines.

Abstract

Today's visuo-tactile sensors come in many shapes and sizes, making it challenging to develop general-purpose tactile representations. This is because most models are tied to a specific sensor design. To address this challenge, we propose two approaches to cross-sensor image generation. The first is an end-to-end method that leverages paired data (Touch2Touch). The second method builds an intermediate depth representation and does not require paired data (T2D2: Touch-to-Depth-to-Touch). Both methods enable the use of sensor-specific models across multiple sensors via the cross-sensor touch generation process. Together, these models offer flexible solutions for sensor translation, depending on data availability and application needs. We demonstrate their effectiveness on downstream tasks such as in-hand pose estimation and behavior cloning, successfully transferring models trained on one sensor to another. Project page: https://samantabelen.github.io/cross_sensor_touch_generation.

Paper Structure

This paper contains 22 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Transferring manipulation skills between touch sensors via cross-modal prediction. We execute a manipulation skill designed for one touch sensor (Soft Bubble) on a robot equipped with a different sensor (GelSlim). We demonstrate two approaches to the translation of one touch signal to another --- that is, we predict what the object would have felt like if it were manipulated with Soft Bubble rather than GelSlim. The signal is then used for the downstream skill.
  • Figure 2: Translating signals between touch sensors. We investigate two different approaches to cross-sensor touch translation. (a) We train a latent diffusion model to direct predict one sensor's signal from another's, using paired training data. (b) We use depth as an intermediate representation, thus avoiding the need for paired training data. We predict depth from touch, adapt the depth map to match the specifications of another sensor, then generate a touch signal from the resulting depth map. We use the resulting touch translation models for robotic manipulation tasks.
  • Figure 3: Generation Qualitative Results. Qualitative results for unseen grasps and tools using T2T and T2D2. Rows indicate sensor transfer directions (B → G, G → B); columns show input, model outputs, and ground truth.
  • Figure 4: T2D2 Qualitative Results. Evaluation of T2D2 model on unseen grasps and tools. Each block shows the input, adapted depth map, generated tactile output, and ground truth (GT) for various sensor transfers.
  • Figure 5: Marble rolling policy transfer via T2D2. We train behavior-cloning policy on GelSlim tactile images to roll a marble from random starts to the image center. At test time on DIGIT, we translate each DIGIT tactile signature to its GelSlim counterpart with T2D2 and run the same policy unchanged. Left: pipeline—DIGIT → (T2D2) → GelSlim → GelSlim-trained policy. Right: transferred roll-outs on DIGIT converge to the center (zero-shot; no retraining).
  • ...and 2 more figures