Table of Contents
Fetching ...

Emerging Practices for Large Multimodal Model (LMM) Assistance for People with Visual Impairments: Implications for Design

Jingyi Xie, Rui Yu, He Zhang, Sooyeon Lee, Syed Masum Billah, John M. Carroll

TL;DR

The paper investigates how large multimodal models (LMMs) like Be My AI serve as cognitive extensions for people with visual impairments, examining daily use across home, spatial, social, and animal contexts. Using a qualitative study with 14 visually impaired participants, it demonstrates that Be My AI enhances perception and supports task performance, yet often remains non-goal-oriented and prone to AI hallucinations. The authors frame Be My AI within distributed cognition, showing how users offload visual processing to the artifact while adapting strategies and integrating other senses and human input. They derive design implications for goal-oriented, real-time, and reliable AI-powered assistive technologies to improve autonomy and safety for people with visual impairments.

Abstract

People with visual impairments perceive their environment non-visually and often use AI-powered assistive tools to obtain textual descriptions of visual information. Recent large vision-language model-based AI-powered tools like Be My AI are more capable of understanding users' inquiries in natural language and describing the scene in audible text; however, the extent to which these tools are useful to visually impaired users is currently understudied. This paper aims to fill this gap. Our study with 14 visually impaired users reveals that they are adapting these tools organically -- not only can these tools facilitate complex interactions in household, spatial, and social contexts, but they also act as an extension of users' cognition, as if the cognition were distributed in the visual information. We also found that although the tools are currently not goal-oriented, users accommodate this limitation and embrace the tools' capabilities for broader use. These findings enable us to envision design implications for creating more goal-oriented, real-time processing, and reliable AI-powered assistive technology.

Emerging Practices for Large Multimodal Model (LMM) Assistance for People with Visual Impairments: Implications for Design

TL;DR

The paper investigates how large multimodal models (LMMs) like Be My AI serve as cognitive extensions for people with visual impairments, examining daily use across home, spatial, social, and animal contexts. Using a qualitative study with 14 visually impaired participants, it demonstrates that Be My AI enhances perception and supports task performance, yet often remains non-goal-oriented and prone to AI hallucinations. The authors frame Be My AI within distributed cognition, showing how users offload visual processing to the artifact while adapting strategies and integrating other senses and human input. They derive design implications for goal-oriented, real-time, and reliable AI-powered assistive technologies to improve autonomy and safety for people with visual impairments.

Abstract

People with visual impairments perceive their environment non-visually and often use AI-powered assistive tools to obtain textual descriptions of visual information. Recent large vision-language model-based AI-powered tools like Be My AI are more capable of understanding users' inquiries in natural language and describing the scene in audible text; however, the extent to which these tools are useful to visually impaired users is currently understudied. This paper aims to fill this gap. Our study with 14 visually impaired users reveals that they are adapting these tools organically -- not only can these tools facilitate complex interactions in household, spatial, and social contexts, but they also act as an extension of users' cognition, as if the cognition were distributed in the visual information. We also found that although the tools are currently not goal-oriented, users accommodate this limitation and embrace the tools' capabilities for broader use. These findings enable us to envision design implications for creating more goal-oriented, real-time processing, and reliable AI-powered assistive technology.
Paper Structure (49 sections, 1 table)