ImageTalk: Designing a Multimodal AAC Text Generation System Driven by Image Recognition and Natural Language Generation
Boyin Yang, Puming Jiang, Per Ola Kristensson
TL;DR
<3-5 sentence high-level summary>ImageTalk addresses the challenge of low text-entry rates in AAC for people with motor neuron disease by fusing image recognition with large-language-model–driven text generation to produce richer, controllable narratives with substantial keystroke savings. The authors validate a triple-diamond design process involving proxy-users and end users, achieving up to 95.6% keystroke savings and high user satisfaction, and they distill three design guidelines plus four levels of acceptance for AI-generated content. The work demonstrates how multimodal cues from images, combined with prompts and steering, can enhance the quality and practicality of AAC storytelling. Open-source release of ImageTalk is proposed to accelerate further research and development in AI-assisted AAC.
Abstract
People living with Motor Neuron Disease (plwMND) frequently encounter speech and motor impairments that necessitate a reliance on augmentative and alternative communication (AAC) systems. This paper tackles the main challenge that traditional symbol-based AAC systems offer a limited vocabulary, while text entry solutions tend to exhibit low communication rates. To help plwMND articulate their needs about the system efficiently and effectively, we iteratively design and develop a novel multimodal text generation system called ImageTalk through a tailored proxy-user-based and an end-user-based design phase. The system demonstrates pronounced keystroke savings of 95.6%, coupled with consistent performance and high user satisfaction. We distill three design guidelines for AI-assisted text generation systems design and outline four user requirement levels tailored for AAC purposes, guiding future research in this field.
