Touch Speaks, Sound Feels: A Multimodal Approach to Affective and Social Touch from Robots to Humans
Qiaoqiao Ren, Tony Belpaeme
TL;DR
This study investigates affective and social touch from robots to humans by integrating a vibrotactile sleeve with contact-generated audio to form audiotactile feedback. A within-subject experiment with $32$ Chinese participants demonstrates that multimodal (haptic+audio) stimuli yield higher emotion decoding accuracy ($44.1\%$ across $10$ emotions) than either modality alone ($25\%$ for touch, $31.6\%$ for sound), and gesture decoding generally benefits from multimodal cues. The VibroSleeve features a $5\times5$ motor grid controlled via PWM, paired with noise-cancelling headphones, and a dataset of $84$ audio clips and $84$ tactile sequences derived from $28$ participants, with stimuli selected by $k$-means and translated into vibrotactile patterns. The results underscore the complementary roles of haptic and auditory cues in affective human–robot interaction and offer design guidance for socially expressive robots that leverage multisensory feedback to enhance emotional communication.
Abstract
Affective tactile interaction constitutes a fundamental component of human communication. In natural human-human encounters, touch is seldom experienced in isolation; rather, it is inherently multisensory. Individuals not only perceive the physical sensation of touch but also register the accompanying auditory cues generated through contact. The integration of haptic and auditory information forms a rich and nuanced channel for emotional expression. While extensive research has examined how robots convey emotions through facial expressions and speech, their capacity to communicate social gestures and emotions via touch remains largely underexplored. To address this gap, we developed a multimodal interaction system incorporating a 5*5 grid of 25 vibration motors synchronized with audio playback, enabling robots to deliver combined haptic-audio stimuli. In an experiment involving 32 Chinese participants, ten emotions and six social gestures were presented through vibration, sound, or their combination. Participants rated each stimulus on arousal and valence scales. The results revealed that (1) the combined haptic-audio modality significantly enhanced decoding accuracy compared to single modalities; (2) each individual channel-vibration or sound-effectively supported certain emotions recognition, with distinct advantages depending on the emotional expression; and (3) gestures alone were generally insufficient for conveying clearly distinguishable emotions. These findings underscore the importance of multisensory integration in affective human-robot interaction and highlight the complementary roles of haptic and auditory cues in enhancing emotional communication.
