Augmenting Tactile Simulators with Real-like and Zero-Shot Capabilities
Osher Azulay, Alon Mizrahi, Nimrod Curtis, Avishai Sintov
TL;DR
This paper addresses the reality gap in tactile sensing for high‑resolution optical sensors by introducing SightGAN, a bidirectional, CycleGAN‑based framework augmented with two losses to preserve background and contact semantics in difference images. SightGAN enables zero‑shot transfer to new AllSight sensors by learning real↔sim mappings that retain accurate contact positioning and embedded force cues, achieving significant improvements in image quality (FID/KID) over prior CycleGAN approaches (approx. $47\%$ and $16\%$ respectively) and enabling practical downstream modeling. The approach yields a contact position RMSE of about $3.49$ mm in zero‑shot scenarios, with further gains toward $1$ mm when modest real data is used for fine‑tuning, while preserving force information with RMSEs around $0.42$–$0.81$ N. The framework supports training reinforcement learning policies in manipulation tasks using synthetic yet realistic tactile data, and the authors provide open‑source data and a simulator integration to facilitate broader adoption.
Abstract
Simulating tactile perception could potentially leverage the learning capabilities of robotic systems in manipulation tasks. However, the reality gap of simulators for high-resolution tactile sensors remains large. Models trained on simulated data often fail in zero-shot inference and require fine-tuning with real data. In addition, work on high-resolution sensors commonly focus on ones with flat surfaces while 3D round sensors are essential for dexterous manipulation. In this paper, we propose a bi-directional Generative Adversarial Network (GAN) termed SightGAN. SightGAN relies on the early CycleGAN while including two additional loss components aimed to accurately reconstruct background and contact patterns including small contact traces. The proposed SightGAN learns real-to-sim and sim-to-real processes over difference images. It is shown to generate real-like synthetic images while maintaining accurate contact positioning. The generated images can be used to train zero-shot models for newly fabricated sensors. Consequently, the resulted sim-to-real generator could be built on top of the tactile simulator to provide a real-world framework. Potentially, the framework can be used to train, for instance, reinforcement learning policies of manipulation tasks. The proposed model is verified in extensive experiments with test data collected from real sensors and also shown to maintain embedded force information within the tactile images.
