Toward Artificial Palpation: Representation Learning of Touch on Soft Bodies
Zohar Rimon, Elisei Shafer, Tal Tepper, Efrat Shimron, Aviv Tamar
TL;DR
Toward Artificial Palpation presents a self-supervised representation learning framework that converts sequences of tactile measurements into a single latent representation $z_T$ capturing soft-object structure. An encoder–decoder acts as a forward model for forces, while a downstream flow-matching predictor maps the representation to MRI-ground-truth images, enabling tactile imaging and change detection. The approach is validated in PalpationSim and on real modular breast phantoms, showing lump position/size information can be recovered more reliably from learned representations than from raw force maps, with data volume and simple augmentations driving gains. The work outlines a promising path toward foundation models for touch, outlines data requirements for clinical translation, and discusses limitations and future directions for real-world deployment.
Abstract
Palpation, the use of touch in medical examination, is almost exclusively performed by humans. We investigate a proof of concept for an artificial palpation method based on self-supervised learning. Our key idea is that an encoder-decoder framework can learn a $\textit{representation}$ from a sequence of tactile measurements that contains all the relevant information about the palpated object. We conjecture that such a representation can be used for downstream tasks such as tactile imaging and change detection. With enough training data, it should capture intricate patterns in the tactile measurements that go beyond a simple map of forces -- the current state of the art. To validate our approach, we both develop a simulation environment and collect a real-world dataset of soft objects and corresponding ground truth images obtained by magnetic resonance imaging (MRI). We collect palpation sequences using a robot equipped with a tactile sensor, and train a model that predicts sensory readings at different positions on the object. We investigate the representation learned in this process, and demonstrate its use in imaging and change detection.
