Sensor-Invariant Tactile Representation
Harsh Gupta, Yuchen Mo, Shengmiao Jin, Wenzhen Yuan
TL;DR
This work tackles the hard problem of transferring tactile perception models across diverse vision-based sensors by learning Sensor-Invariant Tactile Representations (SITR). It combines calibration images, a transformer-based encoder, normal-map reconstruction, and supervised contrastive learning trained on a large synthetic dataset with 100 sensor configurations, then evaluates zero-shot transfer on real GelSight-based sensors for shape reconstruction, object classification, and pose estimation. SITR consistently outperforms strong baselines in inter-sensor transfer, demonstrating robust geometry preservation and cross-sensor generalization. The approach paves the way for scalable data/model transfer in tactile sensing and suggests broader applicability to varied sensor designs.
Abstract
High-resolution tactile sensors have become critical for embodied perception and robotic manipulation. However, a key challenge in the field is the lack of transferability between sensors due to design and manufacturing variations, which result in significant differences in tactile signals. This limitation hinders the ability to transfer models or knowledge learned from one sensor to another. To address this, we introduce a novel method for extracting Sensor-Invariant Tactile Representations (SITR), enabling zero-shot transfer across optical tactile sensors. Our approach utilizes a transformer-based architecture trained on a diverse dataset of simulated sensor designs, allowing it to generalize to new sensors in the real world with minimal calibration. Experimental results demonstrate the method's effectiveness across various tactile sensing applications, facilitating data and model transferability for future advancements in the field.
