Table of Contents
Fetching ...

FeelAnyForce: Estimating Contact Force Feedback from Tactile Sensation for Vision-Based Tactile Sensors

Amir-Hossein Shahidzadeh, Gabriele Caddeo, Koushik Alapati, Lorenzo Natale, Cornelia Fermüller, Yiannis Aloimonos

TL;DR

This paper collects a dataset of over 200K indentations using a robotic arm that pressed various indenters onto a GelSight Mini sensor mounted on a force sensor and then used the data to train a multi-head transformer for force regression, which achieves a mean absolute error of 4% on a dataset of unseen real-world objects.

Abstract

In this paper, we tackle the problem of estimating 3D contact forces using vision-based tactile sensors. In particular, our goal is to estimate contact forces over a large range (up to 15 N) on any objects while generalizing across different vision-based tactile sensors. Thus, we collected a dataset of over 200K indentations using a robotic arm that pressed various indenters onto a GelSight Mini sensor mounted on a force sensor and then used the data to train a multi-head transformer for force regression. Strong generalization is achieved via accurate data collection and multi-objective optimization that leverages depth contact images. Despite being trained only on primitive shapes and textures, the regressor achieves a mean absolute error of 4\% on a dataset of unseen real-world objects. We further evaluate our approach's generalization capability to other GelSight mini and DIGIT sensors, and propose a reproducible calibration procedure for adapting the pre-trained model to other vision-based sensors. Furthermore, the method was evaluated on real-world tasks, including weighing objects and controlling the deformation of delicate objects, which relies on accurate force feedback. Project webpage: http://prg.cs.umd.edu/FeelAnyForce

FeelAnyForce: Estimating Contact Force Feedback from Tactile Sensation for Vision-Based Tactile Sensors

TL;DR

This paper collects a dataset of over 200K indentations using a robotic arm that pressed various indenters onto a GelSight Mini sensor mounted on a force sensor and then used the data to train a multi-head transformer for force regression, which achieves a mean absolute error of 4% on a dataset of unseen real-world objects.

Abstract

In this paper, we tackle the problem of estimating 3D contact forces using vision-based tactile sensors. In particular, our goal is to estimate contact forces over a large range (up to 15 N) on any objects while generalizing across different vision-based tactile sensors. Thus, we collected a dataset of over 200K indentations using a robotic arm that pressed various indenters onto a GelSight Mini sensor mounted on a force sensor and then used the data to train a multi-head transformer for force regression. Strong generalization is achieved via accurate data collection and multi-objective optimization that leverages depth contact images. Despite being trained only on primitive shapes and textures, the regressor achieves a mean absolute error of 4\% on a dataset of unseen real-world objects. We further evaluate our approach's generalization capability to other GelSight mini and DIGIT sensors, and propose a reproducible calibration procedure for adapting the pre-trained model to other vision-based sensors. Furthermore, the method was evaluated on real-world tasks, including weighing objects and controlling the deformation of delicate objects, which relies on accurate force feedback. Project webpage: http://prg.cs.umd.edu/FeelAnyForce
Paper Structure (13 sections, 4 equations, 4 figures, 4 tables)

This paper contains 13 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: We present FeelAnyForce, a method for estimating contact forces with sensor generalization capabilities on vision-based tactile sensors. (a) We collect a dataset of tactile-depth-force data by using a robotic arm to press various indenters onto a tactile sensor mounted on a force sensor. We then test its performance on a set of YCB and real world objects. (b) To isolate the contact data $T_i$, we subtract the sensor specific background image. The network is trained to minimize the Force regression error along with depth reconstruction loss. Note that the ground-truth depth $D_i$ is computed using photometric stereo in Gelsight mini. (c) We showcase real-world experiments conducted with our force estimator.
  • Figure 2: The indenters used for the training procedure.
  • Figure 3: We show sampled frames from trajectories of weighing by pushing. We consider objects that differ in weight, shape, and material.
  • Figure 4: Grasping experiments with a plastic cup. We close the gripper to get $f_2=1.74$ N and $f_3= 2.1$ N from the force sensor and our method in the first and second row, respectively.