Table of Contents
Fetching ...

Customizing Generated Signs and Voices of AI Avatars: Deaf-Centric Mixed-Reality Design for Deaf-Hearing Communication

Si Chen, Haocong Cheng, Suzy Su, Stephanie Patterson, Raja Kushalnagar, Qi Wang, Yun Huang

TL;DR

This work addresses the communication barriers between d/Deaf and hearing individuals by exploring mixed-reality (MR) interfaces with AI-generated ASL signing and spoken English interpreting. Using participatory design with 15 DHH students, the study investigates three sessions to elicit challenges, sketch overlay-driven features, and review designs, analyzed through reflexive thematic analysis. It identifies six overlay features and presents design recommendations that balance ASL norms with hearing social norms, emphasizing authenticity, eye contact, and user control. The findings offer practical guidance for deploying MR-based interpreting tools as additive, customizable support to human interpreters, aimed at improving accessibility, personalization, and social inclusion in mixed-ability interactions.

Abstract

This study investigates innovative interaction designs for communication and collaborative learning between learners of mixed hearing and signing abilities, leveraging advancements in mixed reality technologies like Apple Vision Pro and generative AI for animated avatars. Adopting a participatory design approach, we engaged 15 d/Deaf and hard of hearing (DHH) students to brainstorm ideas for an AI avatar with interpreting ability (sign language to English, voice to English) that would facilitate their face-to-face communication with hearing peers. Participants envisioned the AI avatars to address some issues with human interpreters, such as lack of availability, and provide affordable options to expensive personalized interpreting service. Our findings indicate a range of preferences for integrating the AI avatars with actual human figures of both DHH and hearing communication partners. The participants highlighted the importance of having control over customizing the AI avatar, such as AI-generated signs, voices, facial expressions, and their synchronization for enhanced emotional display in communication. Based on our findings, we propose a suite of design recommendations that balance respecting sign language norms with adherence to hearing social norms. Our study offers insights on improving the authenticity of generative AI in scenarios involving specific, and sometimes unfamiliar, social norms.

Customizing Generated Signs and Voices of AI Avatars: Deaf-Centric Mixed-Reality Design for Deaf-Hearing Communication

TL;DR

This work addresses the communication barriers between d/Deaf and hearing individuals by exploring mixed-reality (MR) interfaces with AI-generated ASL signing and spoken English interpreting. Using participatory design with 15 DHH students, the study investigates three sessions to elicit challenges, sketch overlay-driven features, and review designs, analyzed through reflexive thematic analysis. It identifies six overlay features and presents design recommendations that balance ASL norms with hearing social norms, emphasizing authenticity, eye contact, and user control. The findings offer practical guidance for deploying MR-based interpreting tools as additive, customizable support to human interpreters, aimed at improving accessibility, personalization, and social inclusion in mixed-ability interactions.

Abstract

This study investigates innovative interaction designs for communication and collaborative learning between learners of mixed hearing and signing abilities, leveraging advancements in mixed reality technologies like Apple Vision Pro and generative AI for animated avatars. Adopting a participatory design approach, we engaged 15 d/Deaf and hard of hearing (DHH) students to brainstorm ideas for an AI avatar with interpreting ability (sign language to English, voice to English) that would facilitate their face-to-face communication with hearing peers. Participants envisioned the AI avatars to address some issues with human interpreters, such as lack of availability, and provide affordable options to expensive personalized interpreting service. Our findings indicate a range of preferences for integrating the AI avatars with actual human figures of both DHH and hearing communication partners. The participants highlighted the importance of having control over customizing the AI avatar, such as AI-generated signs, voices, facial expressions, and their synchronization for enhanced emotional display in communication. Based on our findings, we propose a suite of design recommendations that balance respecting sign language norms with adherence to hearing social norms. Our study offers insights on improving the authenticity of generative AI in scenarios involving specific, and sometimes unfamiliar, social norms.
Paper Structure (41 sections, 9 figures, 2 tables)

This paper contains 41 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: An overview of the study procedure.
  • Figure 2: Example Feature Sketches from Two Perspectives: DHH Individual See Hearing and How DHH Individual Perceive Hearing See Them.
  • Figure 3: A14's Feature Sketches showing herself at the airport where she needs instant, fast, and accurate communication envisioned using proposed interpreting AI. She, as well as the majority of our participants, prefers 'No Overlay' and 'Partial Overlay' over 'Complete Overlay.' In her illustration, she emphasizes the significance of DHH individual's control over whether to utilize the filter and fine-tune its settings, particularly in relation to deciding whether to apply a facial overlay onto the hearing individual in communication.
  • Figure 4: Researchers Summarized Detailed Design Features suggested by DHH participants
  • Figure 5: A10's drawing on the left shows the 'no overlay' view having the adjustable location of the avatar in a 'no overlay' view from a DHH individual's perspective (with signing hands and fingers). Her drawing also highlights the whole body of the interpreting avatar needs to be within view for clear visibility but not in the center, which might overlap the other individual in the glass view. From a hearing individual's perspective, the hand/arm overlay was deemed unnecessary for DHH individuals. On the other hand, the use of a face overlay, which includes AI-generated mouth movements, was considered acceptable as long as it was deemed helpful by the hearing individual.
  • ...and 4 more figures