Table of Contents
Fetching ...

SemanticFeels: Semantic Labeling during In-Hand Manipulation

Anas Al Shikh Khalil, Haozhi Qi, Roberto Calandra

TL;DR

SemanticFeels extends NeuralFeels to enable semantic labeling of materials during in-hand robot manipulation by fusing tactile-based material predictions with a neural implicit surface representation. The method uses Digit tactile images processed by EfficientNet-B0 to classify local materials, which are embedded into an augmented neural SDF to jointly predict geometry and material regions. A 20,749-sample tactile dataset across four materials supports offline training, and real-time experiments demonstrate high per-sensor accuracy and a 79.87% average material-map matching on multi-material objects. The work advances tactile-anchored semantic understanding in dexterous manipulation, enabling more adaptive and robust manipulation policies under material variation.

Abstract

As robots become increasingly integrated into everyday tasks, their ability to perceive both the shape and properties of objects during in-hand manipulation becomes critical for adaptive and intelligent behavior. We present SemanticFeels, an extension of the NeuralFeels framework that integrates semantic labeling with neural implicit shape representation, from vision and touch. To illustrate its application, we focus on material classification: high-resolution Digit tactile readings are processed by a fine-tuned EfficientNet-B0 convolutional neural network (CNN) to generate local material predictions, which are then embedded into an augmented signed distance field (SDF) network that jointly predicts geometry and continuous material regions. Experimental results show that the system achieves a high correspondence between predicted and actual materials on both single- and multi-material objects, with an average matching accuracy of 79.87% across multiple manipulation trials on a multi-material object.

SemanticFeels: Semantic Labeling during In-Hand Manipulation

TL;DR

SemanticFeels extends NeuralFeels to enable semantic labeling of materials during in-hand robot manipulation by fusing tactile-based material predictions with a neural implicit surface representation. The method uses Digit tactile images processed by EfficientNet-B0 to classify local materials, which are embedded into an augmented neural SDF to jointly predict geometry and material regions. A 20,749-sample tactile dataset across four materials supports offline training, and real-time experiments demonstrate high per-sensor accuracy and a 79.87% average material-map matching on multi-material objects. The work advances tactile-anchored semantic understanding in dexterous manipulation, enabling more adaptive and robust manipulation policies under material variation.

Abstract

As robots become increasingly integrated into everyday tasks, their ability to perceive both the shape and properties of objects during in-hand manipulation becomes critical for adaptive and intelligent behavior. We present SemanticFeels, an extension of the NeuralFeels framework that integrates semantic labeling with neural implicit shape representation, from vision and touch. To illustrate its application, we focus on material classification: high-resolution Digit tactile readings are processed by a fine-tuned EfficientNet-B0 convolutional neural network (CNN) to generate local material predictions, which are then embedded into an augmented signed distance field (SDF) network that jointly predicts geometry and continuous material regions. Experimental results show that the system achieves a high correspondence between predicted and actual materials on both single- and multi-material objects, with an average matching accuracy of 79.87% across multiple manipulation trials on a multi-material object.
Paper Structure (14 sections, 1 equation, 5 figures)

This paper contains 14 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: SemanticFeels framework: Diagram of our hardware and backend pipeline. An RGB-D camera and multiple tactile sensors collect visual and touch data from objects. This information is used to generate point clouds. Tactile images obtained from the tactile sensors, are processed by a classification model to predict local material types. These predictions are fused into a neural signed distance field (SDF) that reconstructs the object’s 3D shape while assigning semantic material labels.
  • Figure 2: Hardware and backend setup. (A) Allegro Hand with fingertip Digit sensors Intel RealSense D435i cameras Unknown2019-sg. (B) Dual-branch network: Extended NeuralFeels with a material mapping network. (C) Material Samples used for training the material classification model. (D) Objects used for real-time experiments. From left to right: A toy completely covered with fabric, a yarn ball, wooden candle holder, a stainless steel powder shaker, a plastic toy, and a plastic toy covered partially with a piece of fabric.
  • Figure 3: Model performance on hand-collected dataset. (A) Confusion matrix showing high classification accuracy across four material classes using the hand-collected dataset. (B) Validation accuracy per sensor, with the ring sensor achieving the highest test accuracy (99.60%) and the thumb sensor showing relatively lower performance (97.42%). These results are based on the offline dataset used for model training and validation, prior to deployment in real-time scenarios.
  • Figure 4: Evaluation of material classification using real-time robot-collected data. (A) Heatmap showing mean classification accuracy across digit sensors. Plastic materials achieve near-perfect accuracy on all sensors except the thumb. Ring and index fingers yield the highest overall accuracy. Wood and metal are harder to classify, especially with the thumb. (B) Bar plot of classification accuracy by digit sensor with error bars. Plastic shows high accuracy and low variability (except thumb). Ring and index fingers consistently perform better, while wood and metal exhibit lower and more variable accuracy, particularly on thumb and middle fingers.
  • Figure 5: Evaluation on Real-Time Robot-Collected Runs of a Multi-Material Object. (A) Matching percentage across four runs showing progressive improvements in map alignment over time, with final values exceeding 75% in each case. (B) Two examples of the resulting maps. (Left) The ground truth material map, colored according to material classes with a black color on regions not under study. (Middle) The predicted material map. (Right) The difference mask which is used to calculate the matching percentage. (C) An example run showcasing the progression of the predicted map over time.