TactEx: An Explainable Multimodal Robotic Interaction Framework for Human-Like Touch and Hardness Estimation

Felix Verstraete; Lan Wei; Wen Fan; Dandan Zhang

TactEx: An Explainable Multimodal Robotic Interaction Framework for Human-Like Touch and Hardness Estimation

Felix Verstraete, Lan Wei, Wen Fan, Dandan Zhang

TL;DR

TactEx is presented, an explainable multimodal robotic interaction framework that unifies vision, touch, and language for human-like hardness estimation and interactive guidance that achieves 90% task success on simple user queries and generalises to novel tasks without large-scale tuning.

Abstract

Accurate perception of object hardness is essential for safe and dexterous contact-rich robotic manipulation. Here, we present TactEx, an explainable multimodal robotic interaction framework that unifies vision, touch, and language for human-like hardness estimation and interactive guidance. We evaluate TactEx on fruit-ripeness assessment, a representative task that requires both tactile sensing and contextual understanding. The system fuses GelSight-Mini tactile streams with RGB observations and language prompts. A ResNet50+LSTM model estimates hardness from sequential tactile data, while a cross-modal alignment module combines visual cues with guidance from a large language model (LLM). This explainable multimodal interface allows users to distinguish ripeness levels with statistically significant class separation (p < 0.01 for all fruit pairs). For touch placement, we compare YOLO with Grounded-SAM (GSAM) and find GSAM to be more robust for fine-grained segmentation and contact-site selection. A lightweight LLM parses user instructions and produces grounded natural-language explanations linked to the tactile outputs. In end-to-end evaluations, TactEx attains 90% task success on simple user queries and generalises to novel tasks without large-scale tuning. These results highlight the promise of combining pretrained visual and tactile models with language grounding to advance explainable, human-like touch perception and decision-making in robotics.

TactEx: An Explainable Multimodal Robotic Interaction Framework for Human-Like Touch and Hardness Estimation

TL;DR

Abstract

Paper Structure (37 sections, 1 equation, 6 figures, 4 tables)

This paper contains 37 sections, 1 equation, 6 figures, 4 tables.

INTRODUCTION
Related Work
Hardness and Ripeness Estimation
Language Grounded Multimodal Models
Methods
System Interface
Vision and Object Grounding
Tactile Acquisition and Hardness Estimation
Language Generation and Output
System Components
Visual Servoing Methods
Tactile Perception Methods
LLM
Experiments
Dataset Description
...and 22 more sections

Figures (6)

Figure 1: Overview of TactEx (“The Tactile Explainer”), a multimodal framework for fruit ripeness explanation. Users interact via a chat interface (A), objects are localized with YOLO or GSAM (B1), hardness is estimated with a GelSight sensor (B2), and an LLM composes the final response from the fruit names, locations and hardness values (B3–C). The three components are detailed in \ref{['sec:serv']}, section \ref{['sec:tac']} and section \ref{['sec:llm']}, respectively.
Figure 2: Example of the Grounded SAM procedure: (a) original scene, (b) object detection with bounding box, (c) results SAM with inner mask for computing the centroid.
Figure 3: Data collection: images were compared to a reference image. If the contact criteria were met, 8 images were captured and transformed into a 2 or 4 image sequence.
Figure 4: Results of tactile predictions from the main ResNet50-LSTM3 model after pretraining (a) and fine-tuning (b). This is the model that will eventually be implemented within TactEx.
Figure 5: Success rates of the TactEx framework across four interaction scenarios of increasing complexity (Sc1-Sc4), as defined by Table \ref{['tab:complexity']}. SL-SR: Scenario-Level Success Rate, OL-SR: Object-Level Success Rate, w/o: without, Sc: Senario.
...and 1 more figures

TactEx: An Explainable Multimodal Robotic Interaction Framework for Human-Like Touch and Hardness Estimation

TL;DR

Abstract

TactEx: An Explainable Multimodal Robotic Interaction Framework for Human-Like Touch and Hardness Estimation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)