The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning

Jin Chen; Yifeng Lin; Chao Zeng; Si Wu; Tiesong Zhao

The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning

Jin Chen, Yifeng Lin, Chao Zeng, Si Wu, Tiesong Zhao

Abstract

The standardization of vibrotactile data by IEEE P1918.1 workgroup has greatly advanced its applications in virtual reality, human-computer interaction and embodied artificial intelligence. Despite these efforts, the semantic interpretation and understanding of vibrotactile signals remain an unresolved challenge. In this paper, we make the first attempt to address vibrotactile captioning, {\it i.e.}, generating natural language descriptions from vibrotactile signals. We propose Vibrotactile Periodic-Aperiodic Captioning (ViPAC), a method designed to handle the intrinsic properties of vibrotactile data, including hybrid periodic-aperiodic structures and the lack of spatial semantics. Specifically, ViPAC employs a dual-branch strategy to disentangle periodic and aperiodic components, combined with a dynamic fusion mechanism that adaptively integrates signal features. It also introduces an orthogonality constraint and weighting regularization to ensure feature complementarity and fusion consistency. Additionally, we construct LMT108-CAP, the first vibrotactile-text paired dataset, using GPT-4o to generate five constrained captions per surface image from the popular LMT-108 dataset. Experiments show that ViPAC significantly outperforms the baseline methods adapted from audio and image captioning, achieving superior lexical fidelity and semantic alignment.

The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning

Abstract

The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning

Abstract

Paper Structure

Table of Contents

Figures (6)