Table of Contents
Fetching ...

Imagine2touch: Predictive Tactile Sensing for Robotic Manipulation using Efficient Low-Dimensional Signals

Abdallah Ayad, Adrian Röfer, Nick Heppert, Abhinav Valada

TL;DR

Imagine2touch introduces a cross-modal framework that predicts tactile readings from shallow depth-image patches to endow robots with predictive touch capabilities. The approach uses a compact neural architecture to map $z_d \in \mathbb{R}^{48\times48}$ to $\tilde{\tau} \in \mathbb{R}^{15}$, trained on a small, low-cost ReSkin dataset of 1630 tactile–vision pairs, and evaluated via an ensemble-based object recognition pipeline. Results show that the predictive tactile signal supports object recognition after multiple touches, outperforming a proprioceptive baseline and generalizing to out-of-distribution objects. This work highlights the practical potential of inexpensive tactile sensors for visuo-tactile perception and points to future directions such as reconstructing depth from tactile signals for full 3D tactile understanding.

Abstract

Humans seemingly incorporate potential touch signals in their perception. Our goal is to equip robots with a similar capability, which we term Imagine2touch. Imagine2touch aims to predict the expected touch signal based on a visual patch representing the area to be touched. We use ReSkin, an inexpensive and compact touch sensor to collect the required dataset through random touching of five basic geometric shapes, and one tool. We train Imagine2touch on two out of those shapes and validate it on the ood. tool. We demonstrate the efficacy of Imagine2touch through its application to the downstream task of object recognition. In this task, we evaluate Imagine2touch performance in two experiments, together comprising 5 out of training distribution objects. Imagine2touch achieves an object recognition accuracy of 58% after ten touches per object, surpassing a proprioception baseline.

Imagine2touch: Predictive Tactile Sensing for Robotic Manipulation using Efficient Low-Dimensional Signals

TL;DR

Imagine2touch introduces a cross-modal framework that predicts tactile readings from shallow depth-image patches to endow robots with predictive touch capabilities. The approach uses a compact neural architecture to map to , trained on a small, low-cost ReSkin dataset of 1630 tactile–vision pairs, and evaluated via an ensemble-based object recognition pipeline. Results show that the predictive tactile signal supports object recognition after multiple touches, outperforming a proprioceptive baseline and generalizing to out-of-distribution objects. This work highlights the practical potential of inexpensive tactile sensors for visuo-tactile perception and points to future directions such as reconstructing depth from tactile signals for full 3D tactile understanding.

Abstract

Humans seemingly incorporate potential touch signals in their perception. Our goal is to equip robots with a similar capability, which we term Imagine2touch. Imagine2touch aims to predict the expected touch signal based on a visual patch representing the area to be touched. We use ReSkin, an inexpensive and compact touch sensor to collect the required dataset through random touching of five basic geometric shapes, and one tool. We train Imagine2touch on two out of those shapes and validate it on the ood. tool. We demonstrate the efficacy of Imagine2touch through its application to the downstream task of object recognition. In this task, we evaluate Imagine2touch performance in two experiments, together comprising 5 out of training distribution objects. Imagine2touch achieves an object recognition accuracy of 58% after ten touches per object, surpassing a proprioception baseline.
Paper Structure (4 sections, 2 equations, 3 figures, 2 tables)

This paper contains 4 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: (a): Robotic setup for our approach. The alignment vector shows the direction on which the robot moves for collecting one data sample to pair the wrist camera and ReSkin readings. (b): Data flow for training our model and using its inference in object recognition. The depth patch is cropped and processed from the full image using the end-effector pose $\tensor[^{W}]{\mathbf{T}}{_{\tau,i}}$ to match the touch area. It is then passed to the model, which we optimize using the MSE-loss between its output and the real touch reading. At recognition time, the robot has access only to possible 3D renderings. We use the probabilistic touch model in \ref{['sec:technical_approach:object_recognition']} for recognition. (c): Objects set: primitives. First row: primitives used for training the model. Second row: primitives used for one instance of the object recognition experiment. (d): Objects set: tools. First row: Tools used for validating the model. Second row: Tools used for the second instance of the object recognition experiment. (e): Full objects dataset used for analysis.
  • Figure 2: t-SNE plot of our data distribution. Five-means clustering of our processed depth data points with the associated images, tactile visualizations, and RGB images for example points. The means of the clusters are projected and highlighted in red with associated mean processed depth images and mean tactile visualizations. The plot shows distributed sensor activation, and correspondence between the depth patches and the signals.
  • Figure 3: Shape classification experiment setup. We use differently shaped stamps to indent the statically mounted sensor's gel pad in different locations. From left to right the shapes are: T, circle, angle, triangle, cross. All stamps are at most $10mm$ wide and $3.5mm$ deep.