The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?

Roberto Calandra; Andrew Owens; Manu Upadhyaya; Wenzhen Yuan; Justin Lin; Edward H. Adelson; Sergey Levine

The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?

Roberto Calandra, Andrew Owens, Manu Upadhyaya, Wenzhen Yuan, Justin Lin, Edward H. Adelson, Sergey Levine

TL;DR

The paper tackles predicting grasp outcomes for a two-finger robot gripper by fusing high-resolution GelSight tactile sensing with RGB vision in an end-to-end learning framework. It introduces a late-fusion CNN that computes the grasp-success probability $y=f(x)$ from inputs $x=(I_{RGB}, I_{GelSightL}, I_{GelSightR})$, incorporating temporal cues such as $I_{T_a}$, $I_{T_b}$, and the GelSight difference $I_{T_b}-I_{T_a}$ to produce $y=f(x)$. On a dataset of 9,269 grasps across 106 objects, tactile and visuo-tactile models outperform vision-only baselines, with the full visuo-tactile model achieving the best predictive accuracy. In real-world grasping on 12 unseen objects, the visuo-tactile model achieved 94% success compared to 80% for vision-only, demonstrating practical benefits for grasp planning; the study also discusses limitations and future directions for more efficient tactile integration.

Abstract

A successful grasp requires careful balancing of the contact forces. Deducing whether a particular grasp will be successful from indirect measurements, such as vision, is therefore quite challenging, and direct sensing of contacts through touch sensing provides an appealing avenue toward more successful and consistent robotic grasping. However, in order to fully evaluate the value of touch sensing for grasp outcome prediction, we must understand how touch sensing can influence outcome prediction accuracy when combined with other modalities. Doing so using conventional model-based techniques is exceptionally difficult. In this work, we investigate the question of whether touch sensing aids in predicting grasp outcomes within a multimodal sensing framework that combines vision and touch. To that end, we collected more than 9,000 grasping trials using a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger, and evaluated visuo-tactile deep neural network models to directly predict grasp outcomes from either modality individually, and from both modalities together. Our experimental results indicate that incorporating tactile readings substantially improve grasping performance.

The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?

TL;DR

Abstract

The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)