Table of Contents
Fetching ...

Exploring the Impact of Emotional Voice Integration in Sign-to-Speech Translators for Deaf-to-Hearing Communication

Hyunchul Lim, Minghan Gao, Franklin Mingzhe Li, Nam Anh Dang, Ianip Sit, Michelle M Olson, Cheng Zhang

TL;DR

This work tackles the lack of emotional nuance in ASL-to-speech/text translation by evaluating the role of AI-generated emotional voices in three output modes: subtitles only, neutral voice, and emotional voice. Through online surveys with hearing participants and feedback from DHH users, it demonstrates that emotional voice can significantly improve emotion recognition and mitigate misinterpretations caused by linguistic facial markers, though cognitive load and uneven benefits across emotions remain challenges. The study provides practical design guidelines, highlighting prioritized emotions, desired voice features, and the need for adaptive, natural-sounding synthesis to bridge communication gaps between Deaf and hearing communities. Overall, emotional-voice integration emerges as a promising direction for more expressive, effective multimodal ASL translation systems with tangible implications for HCI and CSCW-driven accessibility research.

Abstract

Emotional voice communication plays a crucial role in effective daily interactions. Deaf and hard-of-hearing (DHH) individuals often rely on facial expressions to supplement sign language to convey emotions, as the use of voice is limited. However, in American Sign Language (ASL), these facial expressions serve not only emotional purposes but also as linguistic markers, altering sign meanings and often confusing non-signers when interpreting a signer's emotional state. Most existing ASL translation technologies focus solely on signs, neglecting the role of emotional facial expressions in the translated output (e.g., text, voice). This paper present studies which 1) confirmed the challenges for non-signers of interpreting emotions from facial expressions in ASL communication, of facial expressions, and 2) how integrating emotional voices into translation systems can enhance hearing individuals' comprehension of a signer's emotions. An online survey conducted with 45 hearing participants (Non-ASL Signers) revealed that they frequently misinterpret signers' emotions when emotional and linguistic facial expressions are used simultaneously. The findings indicate that incorporating emotional voice into translation systems significantly improves the recognition of signers' emotions by 32%. Additionally, further research involving 6 DHH participants discusses design considerations for the emotional voice feature from both perspectives, emphasizing the importance of integrating emotional voices in translation systems to bridge communication gaps between DHH and hearing communities.

Exploring the Impact of Emotional Voice Integration in Sign-to-Speech Translators for Deaf-to-Hearing Communication

TL;DR

This work tackles the lack of emotional nuance in ASL-to-speech/text translation by evaluating the role of AI-generated emotional voices in three output modes: subtitles only, neutral voice, and emotional voice. Through online surveys with hearing participants and feedback from DHH users, it demonstrates that emotional voice can significantly improve emotion recognition and mitigate misinterpretations caused by linguistic facial markers, though cognitive load and uneven benefits across emotions remain challenges. The study provides practical design guidelines, highlighting prioritized emotions, desired voice features, and the need for adaptive, natural-sounding synthesis to bridge communication gaps between Deaf and hearing communities. Overall, emotional-voice integration emerges as a promising direction for more expressive, effective multimodal ASL translation systems with tangible implications for HCI and CSCW-driven accessibility research.

Abstract

Emotional voice communication plays a crucial role in effective daily interactions. Deaf and hard-of-hearing (DHH) individuals often rely on facial expressions to supplement sign language to convey emotions, as the use of voice is limited. However, in American Sign Language (ASL), these facial expressions serve not only emotional purposes but also as linguistic markers, altering sign meanings and often confusing non-signers when interpreting a signer's emotional state. Most existing ASL translation technologies focus solely on signs, neglecting the role of emotional facial expressions in the translated output (e.g., text, voice). This paper present studies which 1) confirmed the challenges for non-signers of interpreting emotions from facial expressions in ASL communication, of facial expressions, and 2) how integrating emotional voices into translation systems can enhance hearing individuals' comprehension of a signer's emotions. An online survey conducted with 45 hearing participants (Non-ASL Signers) revealed that they frequently misinterpret signers' emotions when emotional and linguistic facial expressions are used simultaneously. The findings indicate that incorporating emotional voice into translation systems significantly improves the recognition of signers' emotions by 32%. Additionally, further research involving 6 DHH participants discusses design considerations for the emotional voice feature from both perspectives, emphasizing the importance of integrating emotional voices in translation systems to bridge communication gaps between DHH and hearing communities.

Paper Structure

This paper contains 40 sections, 1 equation, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Example Survey Questions: After watching a video clip once, participants were asked to answer the questions related to .
  • Figure 2: Facial expressions in ASL convey emotions (top) and provide additional information, such as altering the meaning of signs, serving as linguistic markers (bottom).
  • Figure 3: Video Clips for Survey
  • Figure 4: Results comparing reading facial expressions alone (marked in blue) versus reading both facial expressions and the meanings of signs (marked in orange): (A) Emotion Perception, (B) Mental Effort in Emotion Perception, (C) Understanding Meaning from Translation, (D) Mental Effort in Understanding Sentences
  • Figure 5: Confusion Matrix over Three Conditions: displaying the performance of emotion perception when observing emotional facial expressions alone (marked in blue) compared to observing emotional facial expressions during signing (marked in orange).
  • ...and 4 more figures