Table of Contents
Fetching ...

Exploring the use of Generative AI to Support Automated Just-in-Time Programming for Visual Scene Displays

Cynthia Zastudil, Christine Holyfield, Christine Kapp, Xandria Crosland, Elizabeth Lorah, Tara Zimmerman, Stephen MacNeil

TL;DR

This paper investigates automating just-in-time communication option generation for Visual Scene Displays (VSDs) in AAC using a Large Multimodal Model (GPT-4V). It compares LMM-generated COs to those created by speech-language pathologists and AAC researchers through a human study (N=13) and expert evaluation (N=5), augmented by semi-structured clinician interviews (N=5). The results show that AI-generated COs are generally contextually relevant and often comparable in quality to human-created options, with context-dependent preferences across playing, storybook, and retelling scenarios. However, personalization gaps and concerns about developmentally appropriate content and biases highlight that AI should augment rather than replace clinician input, and that future work should focus on personalized user models, monitoring/editing interfaces, and bias-mitigation strategies to enable safe, effective deployment in AAC devices.

Abstract

Millions of people worldwide rely on alternative and augmentative communication devices to communicate. Visual scene displays (VSDs) can enhance communication for these individuals by embedding communication options within contextualized images. However, existing VSDs often present default images that may lack relevance or require manual configuration, placing a significant burden on communication partners. In this study, we assess the feasibility of leveraging large multimodal models (LMM), such as GPT-4V, to automatically create communication options for VSDs. Communication options were sourced from a LMM and speech-language pathologists (SLPs) and AAC researchers (N=13) for evaluation through an expert assessment conducted by the SLPs and AAC researchers. We present the study's findings, supplemented by insights from semi-structured interviews (N=5) about SLP's and AAC researchers' opinions on the use of generative AI in augmentative and alternative communication devices. Our results indicate that the communication options generated by the LMM were contextually relevant and often resembled those created by humans. However, vital questions remain that must be addressed before LMMs can be confidently implemented in AAC devices.

Exploring the use of Generative AI to Support Automated Just-in-Time Programming for Visual Scene Displays

TL;DR

This paper investigates automating just-in-time communication option generation for Visual Scene Displays (VSDs) in AAC using a Large Multimodal Model (GPT-4V). It compares LMM-generated COs to those created by speech-language pathologists and AAC researchers through a human study (N=13) and expert evaluation (N=5), augmented by semi-structured clinician interviews (N=5). The results show that AI-generated COs are generally contextually relevant and often comparable in quality to human-created options, with context-dependent preferences across playing, storybook, and retelling scenarios. However, personalization gaps and concerns about developmentally appropriate content and biases highlight that AI should augment rather than replace clinician input, and that future work should focus on personalized user models, monitoring/editing interfaces, and bias-mitigation strategies to enable safe, effective deployment in AAC devices.

Abstract

Millions of people worldwide rely on alternative and augmentative communication devices to communicate. Visual scene displays (VSDs) can enhance communication for these individuals by embedding communication options within contextualized images. However, existing VSDs often present default images that may lack relevance or require manual configuration, placing a significant burden on communication partners. In this study, we assess the feasibility of leveraging large multimodal models (LMM), such as GPT-4V, to automatically create communication options for VSDs. Communication options were sourced from a LMM and speech-language pathologists (SLPs) and AAC researchers (N=13) for evaluation through an expert assessment conducted by the SLPs and AAC researchers. We present the study's findings, supplemented by insights from semi-structured interviews (N=5) about SLP's and AAC researchers' opinions on the use of generative AI in augmentative and alternative communication devices. Our results indicate that the communication options generated by the LMM were contextually relevant and often resembled those created by humans. However, vital questions remain that must be addressed before LMMs can be confidently implemented in AAC devices.
Paper Structure (9 sections, 2 figures, 1 table)

This paper contains 9 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: An example image which could be used in a VSD. Example COs generated by human participants and by the LMM are provided in the figure. The COs can be embedded within the image as clickable "hotspots" or as buttons presented on the display.
  • Figure 2: A comparison of the experts' ratings of COs generated by human participants and GPT-4V. Human-generated COs were preferred for the Playing context; however, for the Storybook and Retelling contexts, LMM-generated COs were preferred.