Table of Contents
Fetching ...

Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication

Yiwen Xu, Monideep Chakraborti, Tianyi Zhang, Katelyn Eng, Aanchan Mohan, Mirjana Prpa

TL;DR

This paper addresses the limited expressivity of current AAC systems by proposing Speak Ease, an AAC platform that fuses multimodal input (text, voice, and contextual cues) with an LLM-driven processing layer and personalized TTS. The approach enables context-aware, emotionally resonant outputs and supports input and output expressivity through mechanisms like refining dysarthric speech, expanding emojis into sentences, and voice personalization. A feasibility study with speech-language pathologists demonstrates both the potential to enhance expressivity and the need to balance complexity, user autonomy, and ethical considerations such as authenticity and potential facilitation. The work highlights the practical impact of integrating multimodal inputs and LLMs in AAC to create richer, more authentic communication experiences for users with speech impairments, while outlining concrete directions for future refinement and ethical safeguards.

Abstract

In this paper, we present Speak Ease: an augmentative and alternative communication (AAC) system to support users' expressivity by integrating multimodal input, including text, voice, and contextual cues (conversational partner and emotional tone), with large language models (LLMs). Speak Ease combines automatic speech recognition (ASR), context-aware LLM-based outputs, and personalized text-to-speech technologies to enable more personalized, natural-sounding, and expressive communication. Through an exploratory feasibility study and focus group evaluation with speech and language pathologists (SLPs), we assessed Speak Ease's potential to enable expressivity in AAC. The findings highlight the priorities and needs of AAC users and the system's ability to enhance user expressivity by supporting more personalized and contextually relevant communication. This work provides insights into the use of multimodal inputs and LLM-driven features to improve AAC systems and support expressivity.

Your voice is your voice: Supporting Self-expression through Speech Generation and LLMs in Augmented and Alternative Communication

TL;DR

This paper addresses the limited expressivity of current AAC systems by proposing Speak Ease, an AAC platform that fuses multimodal input (text, voice, and contextual cues) with an LLM-driven processing layer and personalized TTS. The approach enables context-aware, emotionally resonant outputs and supports input and output expressivity through mechanisms like refining dysarthric speech, expanding emojis into sentences, and voice personalization. A feasibility study with speech-language pathologists demonstrates both the potential to enhance expressivity and the need to balance complexity, user autonomy, and ethical considerations such as authenticity and potential facilitation. The work highlights the practical impact of integrating multimodal inputs and LLMs in AAC to create richer, more authentic communication experiences for users with speech impairments, while outlining concrete directions for future refinement and ethical safeguards.

Abstract

In this paper, we present Speak Ease: an augmentative and alternative communication (AAC) system to support users' expressivity by integrating multimodal input, including text, voice, and contextual cues (conversational partner and emotional tone), with large language models (LLMs). Speak Ease combines automatic speech recognition (ASR), context-aware LLM-based outputs, and personalized text-to-speech technologies to enable more personalized, natural-sounding, and expressive communication. Through an exploratory feasibility study and focus group evaluation with speech and language pathologists (SLPs), we assessed Speak Ease's potential to enable expressivity in AAC. The findings highlight the priorities and needs of AAC users and the system's ability to enhance user expressivity by supporting more personalized and contextually relevant communication. This work provides insights into the use of multimodal inputs and LLM-driven features to improve AAC systems and support expressivity.

Paper Structure

This paper contains 53 sections, 4 figures.

Figures (4)

  • Figure 2: System Workflow
  • Figure 3: Emotion and Context Setting
  • Figure 4: Example of Context-Aware Suggestions Generated by LLM
  • Figure 5: Voice Output Personalization Process