HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning
Hongyu Li, Snehal Dikhale, Jinda Cui, Soshi Iba, Nawid Jamali
TL;DR
HyperTaxel tackles the challenge of low resolution taxel based tactile sensing by learning a geometrically informed tactile representation using a graph neural network and contrastive learning, then performing hyper resolution to map sparse tactile input to high resolution object surfaces via multi contact localization. The approach leverages a CLIP style objective between tactile embeddings and surface embeddings and uses an offline contact database to resolve multiple simultaneous contacts, formalized as a multipartite graph optimization. Experiments on synthetic Isaac Sim data and real world surface classification demonstrate that HyperTaxel captures geometric cues such as flatness and curvature, improves 6D in hand pose estimation when integrated with ViTa, and yields robust sim to real transfer across sensor layouts and objects. The framework thus enhances dexterous manipulation capabilities by providing a geometry aware, high resolution tactile representation and a generalizable hyper resolution mechanism.
Abstract
To achieve dexterity comparable to that of humans, robots must intelligently process tactile sensor data. Taxel-based tactile signals often have low spatial-resolution, with non-standardized representations. In this paper, we propose a novel framework, HyperTaxel, for learning a geometrically-informed representation of taxel-based tactile signals to address challenges associated with their spatial resolution. We use this representation and a contrastive learning objective to encode and map sparse low-resolution taxel signals to high-resolution contact surfaces. To address the uncertainty inherent in these signals, we leverage joint probability distributions across multiple simultaneous contacts to improve taxel hyper-resolution. We evaluate our representation by comparing it with two baselines and present results that suggest our representation outperforms the baselines. Furthermore, we present qualitative results that demonstrate the learned representation captures the geometric features of the contact surface, such as flatness, curvature, and edges, and generalizes across different objects and sensor configurations. Moreover, we present results that suggest our representation improves the performance of various downstream tasks, such as surface classification, 6D in-hand pose estimation, and sim-to-real transfer.
