Table of Contents
Fetching ...

PseudoTouch: Efficiently Imaging the Surface Feel of Objects for Robotic Manipulation

Adrian Röfer, Nick Heppert, Abdallah Ayad, Eugenio Chisari, Abhinav Valada

TL;DR

This work introduces PseudoTouch, a light-weight framework that infers tactile readings from small depth patches to create a high-signal visual-tactile embedding. The approach maps depth inputs $z_d \in \mathbb{R}^{17\times17}$ to tactile outputs $\tilde{\tau} \in \mathbb{R}^{15}$ using a compact neural network, trained on data from eight primitive shapes and extended to everyday objects. It validates the utility of the embedding on object recognition (achieving $84\%$ accuracy after ten touches on everyday items) and grasp stability prediction, where tactile-derived predictions substantially outperform baselines relying on partial point clouds, including in sim2real settings. The paper also demonstrates data-efficient training via simulated depth patches and releases data, code, and models to facilitate adoption in robotics research and applications.

Abstract

Tactile sensing is vital for human dexterous manipulation, however, it has not been widely used in robotics. Compact, low-cost sensing platforms can facilitate a change, but unlike their popular optical counterparts, they are difficult to deploy in high-fidelity tasks due to their low signal dimensionality and lack of a simulation model. To overcome these challenges, we introduce PseudoTouch which links high-dimensional structural information to low-dimensional sensor signals. It does so by learning a low-dimensional visual-tactile embedding, wherein we encode a depth patch from which we decode the tactile signal. We collect and train PseudoTouch on a dataset comprising aligned tactile and visual data pairs obtained through random touching of eight basic geometric shapes. We demonstrate the utility of our trained PseudoTouch model in two downstream tasks: object recognition and grasp stability prediction. In the object recognition task, we evaluate the learned embedding's performance on a set of five basic geometric shapes and five household objects. Using PseudoTouch, we achieve an object recognition accuracy 84% after just ten touches, surpassing a proprioception baseline. For the grasp stability task, we use ACRONYM labels to train and evaluate a grasp success predictor using PseudoTouch's predictions derived from virtual depth information. Our approach yields a 32% absolute improvement in accuracy compared to the baseline relying on partial point cloud data. We make the data, code, and trained models publicly available at https://pseudotouch.cs.uni-freiburg.de.

PseudoTouch: Efficiently Imaging the Surface Feel of Objects for Robotic Manipulation

TL;DR

This work introduces PseudoTouch, a light-weight framework that infers tactile readings from small depth patches to create a high-signal visual-tactile embedding. The approach maps depth inputs to tactile outputs using a compact neural network, trained on data from eight primitive shapes and extended to everyday objects. It validates the utility of the embedding on object recognition (achieving accuracy after ten touches on everyday items) and grasp stability prediction, where tactile-derived predictions substantially outperform baselines relying on partial point clouds, including in sim2real settings. The paper also demonstrates data-efficient training via simulated depth patches and releases data, code, and models to facilitate adoption in robotics research and applications.

Abstract

Tactile sensing is vital for human dexterous manipulation, however, it has not been widely used in robotics. Compact, low-cost sensing platforms can facilitate a change, but unlike their popular optical counterparts, they are difficult to deploy in high-fidelity tasks due to their low signal dimensionality and lack of a simulation model. To overcome these challenges, we introduce PseudoTouch which links high-dimensional structural information to low-dimensional sensor signals. It does so by learning a low-dimensional visual-tactile embedding, wherein we encode a depth patch from which we decode the tactile signal. We collect and train PseudoTouch on a dataset comprising aligned tactile and visual data pairs obtained through random touching of eight basic geometric shapes. We demonstrate the utility of our trained PseudoTouch model in two downstream tasks: object recognition and grasp stability prediction. In the object recognition task, we evaluate the learned embedding's performance on a set of five basic geometric shapes and five household objects. Using PseudoTouch, we achieve an object recognition accuracy 84% after just ten touches, surpassing a proprioception baseline. For the grasp stability task, we use ACRONYM labels to train and evaluate a grasp success predictor using PseudoTouch's predictions derived from virtual depth information. Our approach yields a 32% absolute improvement in accuracy compared to the baseline relying on partial point cloud data. We make the data, code, and trained models publicly available at https://pseudotouch.cs.uni-freiburg.de.
Paper Structure (15 sections, 7 equations, 5 figures, 2 tables)

This paper contains 15 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of the PseudoTouch model which infers the tactile signal given a visual input image. We use an unsupervised automatic data collection (a) to generate a dataset of task-agnostic visual-tactile pairs (b) used for training the model (c). We demonstrate the model's utility in two downstream tasks. First, for object recognition (i) in which we compare PseudoTouch's generated touch readings of potential objects to measured readings. Second, we use PseudoTouch to automatically enhance grasp labels with touch readings in a grasp stability prediction task (ii).
  • Figure 2: Object Recognition Model. $o$ is an object hypothesis, $l$ is the location where the actual touch was recorded, $\tilde{l}$ is the location where we intended to touch and $\tau$ is the actual result of our touch. $N$ is the number of all touches performed.
  • Figure 3: Objects used for training and validation of PseudoTouch. Left: Primitive shape objects used for training. Right: Everyday objects are used for object recognition experiments for validation. In clock-wise order starting at the 12 o'clock position: puncher, apple, bulb, pc mouse, tin box.
  • Figure 4: (a): Overview of physical robot setup for data collection. The ReSkin sensor is attached as a fingertip to the robot's gripper. Using the Realsense D405 camera, we capture an image before touching the object. The object is attached to a wooden anchor mounted to the table for repeatability. (b): Illustration of data processing and inference. From a real depth image, we crop the section that the robot touched using the recorded end-effector pose $\tensor[^{W}]{\mathbf{T}}{_{\tau,i}}$. To mitigate the gap between real and simulated data, we use the same pose and a mesh of the object to render a simulated sample. We normalize both depth patches and pass them through our PseudoTouch model. Finally, we minimize the MSE-loss deviation from the actual sensor reading.
  • Figure 5: Left: Setup for grasping with ReSkin sensors. We 3D-print two fingers for the sensors, which possess suitable bores for mounting the sensors and gel pads to them. The microcontrollers are attached to the gripper, and the wires are left floating loosely so they do not get strained by the movement of the fingers. Right: Overview of our objects used in our grasping validation. With the objects, we try to cover a range of different shapes and weights. Back row: Toy horse, pack of sugar, cup, tape dispenser. Front row: Gum can, mango, tennis ball.