Exploring the Impact of Emotional Voice Integration in Sign-to-Speech Translators for Deaf-to-Hearing Communication
Hyunchul Lim, Minghan Gao, Franklin Mingzhe Li, Nam Anh Dang, Ianip Sit, Michelle M Olson, Cheng Zhang
TL;DR
This work tackles the lack of emotional nuance in ASL-to-speech/text translation by evaluating the role of AI-generated emotional voices in three output modes: subtitles only, neutral voice, and emotional voice. Through online surveys with hearing participants and feedback from DHH users, it demonstrates that emotional voice can significantly improve emotion recognition and mitigate misinterpretations caused by linguistic facial markers, though cognitive load and uneven benefits across emotions remain challenges. The study provides practical design guidelines, highlighting prioritized emotions, desired voice features, and the need for adaptive, natural-sounding synthesis to bridge communication gaps between Deaf and hearing communities. Overall, emotional-voice integration emerges as a promising direction for more expressive, effective multimodal ASL translation systems with tangible implications for HCI and CSCW-driven accessibility research.
Abstract
Emotional voice communication plays a crucial role in effective daily interactions. Deaf and hard-of-hearing (DHH) individuals often rely on facial expressions to supplement sign language to convey emotions, as the use of voice is limited. However, in American Sign Language (ASL), these facial expressions serve not only emotional purposes but also as linguistic markers, altering sign meanings and often confusing non-signers when interpreting a signer's emotional state. Most existing ASL translation technologies focus solely on signs, neglecting the role of emotional facial expressions in the translated output (e.g., text, voice). This paper present studies which 1) confirmed the challenges for non-signers of interpreting emotions from facial expressions in ASL communication, of facial expressions, and 2) how integrating emotional voices into translation systems can enhance hearing individuals' comprehension of a signer's emotions. An online survey conducted with 45 hearing participants (Non-ASL Signers) revealed that they frequently misinterpret signers' emotions when emotional and linguistic facial expressions are used simultaneously. The findings indicate that incorporating emotional voice into translation systems significantly improves the recognition of signers' emotions by 32%. Additionally, further research involving 6 DHH participants discusses design considerations for the emotional voice feature from both perspectives, emphasizing the importance of integrating emotional voices in translation systems to bridge communication gaps between DHH and hearing communities.
