Table of Contents
Fetching ...

Augmenting Tactile Simulators with Real-like and Zero-Shot Capabilities

Osher Azulay, Alon Mizrahi, Nimrod Curtis, Avishai Sintov

TL;DR

This paper addresses the reality gap in tactile sensing for high‑resolution optical sensors by introducing SightGAN, a bidirectional, CycleGAN‑based framework augmented with two losses to preserve background and contact semantics in difference images. SightGAN enables zero‑shot transfer to new AllSight sensors by learning real↔sim mappings that retain accurate contact positioning and embedded force cues, achieving significant improvements in image quality (FID/KID) over prior CycleGAN approaches (approx. $47\%$ and $16\%$ respectively) and enabling practical downstream modeling. The approach yields a contact position RMSE of about $3.49$ mm in zero‑shot scenarios, with further gains toward $1$ mm when modest real data is used for fine‑tuning, while preserving force information with RMSEs around $0.42$–$0.81$ N. The framework supports training reinforcement learning policies in manipulation tasks using synthetic yet realistic tactile data, and the authors provide open‑source data and a simulator integration to facilitate broader adoption.

Abstract

Simulating tactile perception could potentially leverage the learning capabilities of robotic systems in manipulation tasks. However, the reality gap of simulators for high-resolution tactile sensors remains large. Models trained on simulated data often fail in zero-shot inference and require fine-tuning with real data. In addition, work on high-resolution sensors commonly focus on ones with flat surfaces while 3D round sensors are essential for dexterous manipulation. In this paper, we propose a bi-directional Generative Adversarial Network (GAN) termed SightGAN. SightGAN relies on the early CycleGAN while including two additional loss components aimed to accurately reconstruct background and contact patterns including small contact traces. The proposed SightGAN learns real-to-sim and sim-to-real processes over difference images. It is shown to generate real-like synthetic images while maintaining accurate contact positioning. The generated images can be used to train zero-shot models for newly fabricated sensors. Consequently, the resulted sim-to-real generator could be built on top of the tactile simulator to provide a real-world framework. Potentially, the framework can be used to train, for instance, reinforcement learning policies of manipulation tasks. The proposed model is verified in extensive experiments with test data collected from real sensors and also shown to maintain embedded force information within the tactile images.

Augmenting Tactile Simulators with Real-like and Zero-Shot Capabilities

TL;DR

This paper addresses the reality gap in tactile sensing for high‑resolution optical sensors by introducing SightGAN, a bidirectional, CycleGAN‑based framework augmented with two losses to preserve background and contact semantics in difference images. SightGAN enables zero‑shot transfer to new AllSight sensors by learning real↔sim mappings that retain accurate contact positioning and embedded force cues, achieving significant improvements in image quality (FID/KID) over prior CycleGAN approaches (approx. and respectively) and enabling practical downstream modeling. The approach yields a contact position RMSE of about mm in zero‑shot scenarios, with further gains toward mm when modest real data is used for fine‑tuning, while preserving force information with RMSEs around N. The framework supports training reinforcement learning policies in manipulation tasks using synthetic yet realistic tactile data, and the authors provide open‑source data and a simulator integration to facilitate broader adoption.

Abstract

Simulating tactile perception could potentially leverage the learning capabilities of robotic systems in manipulation tasks. However, the reality gap of simulators for high-resolution tactile sensors remains large. Models trained on simulated data often fail in zero-shot inference and require fine-tuning with real data. In addition, work on high-resolution sensors commonly focus on ones with flat surfaces while 3D round sensors are essential for dexterous manipulation. In this paper, we propose a bi-directional Generative Adversarial Network (GAN) termed SightGAN. SightGAN relies on the early CycleGAN while including two additional loss components aimed to accurately reconstruct background and contact patterns including small contact traces. The proposed SightGAN learns real-to-sim and sim-to-real processes over difference images. It is shown to generate real-like synthetic images while maintaining accurate contact positioning. The generated images can be used to train zero-shot models for newly fabricated sensors. Consequently, the resulted sim-to-real generator could be built on top of the tactile simulator to provide a real-world framework. Potentially, the framework can be used to train, for instance, reinforcement learning policies of manipulation tasks. The proposed model is verified in extensive experiments with test data collected from real sensors and also shown to maintain embedded force information within the tactile images.
Paper Structure (17 sections, 9 equations, 5 figures, 4 tables)

This paper contains 17 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The sim-to-real generator from the trained SightGAN model is used to map simulated tactile images to real-like images of a 3D round tactile sensor. Since the generated image is close to reality, various models can be trained using the simulator. In this example, a position estimator can provide accurate labeling to the image making it a fine simulator for various tasks.
  • Figure 2: Scheme of the SightGAN model. The model operates on difference images in order to enhance generability to new sensors. Top and bottom rows illustrate the sim-to-real-to-sim and real-to-sim-to-real processes, respectively. The cycle consistency loss of CycleGAN is augmented by two additional losses aimed to provide pixel-level domain adaptation of the contacts.
  • Figure 3: (Left) The AllSight tactile sensor with the internal view of the camera during contact with a round indenter. (Right) Structure illustration of the AllSight sensor.
  • Figure 4: Position estimation error of $f_\theta$ trained with synthetic data from SightGAN over the test sensor with regards to the number of train sensors used to train SightGAN.
  • Figure 5: Position estimation error with regards to the number of real images from the test sensor used to fine-tune model $f_\theta$. Results with zero new tactile images are the zero-shot transfer errors without any fine-tuning.