Customizing Generated Signs and Voices of AI Avatars: Deaf-Centric Mixed-Reality Design for Deaf-Hearing Communication
Si Chen, Haocong Cheng, Suzy Su, Stephanie Patterson, Raja Kushalnagar, Qi Wang, Yun Huang
TL;DR
This work addresses the communication barriers between d/Deaf and hearing individuals by exploring mixed-reality (MR) interfaces with AI-generated ASL signing and spoken English interpreting. Using participatory design with 15 DHH students, the study investigates three sessions to elicit challenges, sketch overlay-driven features, and review designs, analyzed through reflexive thematic analysis. It identifies six overlay features and presents design recommendations that balance ASL norms with hearing social norms, emphasizing authenticity, eye contact, and user control. The findings offer practical guidance for deploying MR-based interpreting tools as additive, customizable support to human interpreters, aimed at improving accessibility, personalization, and social inclusion in mixed-ability interactions.
Abstract
This study investigates innovative interaction designs for communication and collaborative learning between learners of mixed hearing and signing abilities, leveraging advancements in mixed reality technologies like Apple Vision Pro and generative AI for animated avatars. Adopting a participatory design approach, we engaged 15 d/Deaf and hard of hearing (DHH) students to brainstorm ideas for an AI avatar with interpreting ability (sign language to English, voice to English) that would facilitate their face-to-face communication with hearing peers. Participants envisioned the AI avatars to address some issues with human interpreters, such as lack of availability, and provide affordable options to expensive personalized interpreting service. Our findings indicate a range of preferences for integrating the AI avatars with actual human figures of both DHH and hearing communication partners. The participants highlighted the importance of having control over customizing the AI avatar, such as AI-generated signs, voices, facial expressions, and their synchronization for enhanced emotional display in communication. Based on our findings, we propose a suite of design recommendations that balance respecting sign language norms with adherence to hearing social norms. Our study offers insights on improving the authenticity of generative AI in scenarios involving specific, and sometimes unfamiliar, social norms.
