Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People

Ricardo Gonzalez; Jazmin Collins; Shiri Azenkot; Cynthia Bennett

Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People

Ricardo Gonzalez, Jazmin Collins, Shiri Azenkot, Cynthia Bennett

TL;DR

This study investigates how BLV users use AI-powered scene description apps beyond traditional remote-human assistance. Through a two-week diary study with 16 BLV participants, the authors identify use cases, goals, content types, and contexts, and quantify trust, satisfaction, and accuracy of AI-generated descriptions. They reveal both AI-specific and general BLV visual challenges, showing that accuracy strongly influences trust and satisfaction, while users often rely on their prior knowledge to interpret imperfect outputs. The work highlights design opportunities to tailor AI outputs to user contexts, differentiate AI-enabled use from human assistance, and guide future improvements in AI-driven accessibility tools. The results contribute a detailed use-case taxonomy and practical guidance for building more reliable, user-aligned AI scene-description systems for BLV users, especially as AI capabilities continue to evolve.

Abstract

"Scene description" applications that describe visual content in a photo are useful daily tools for blind and low vision (BLV) people. Researchers have studied their use, but they have only explored those that leverage remote sighted assistants; little is known about applications that use AI to generate their descriptions. Thus, to investigate their use cases, we conducted a two-week diary study where 16 BLV participants used an AI-powered scene description application we designed. Through their diary entries and follow-up interviews, users shared their information goals and assessments of the visual descriptions they received. We analyzed the entries and found frequent use cases, such as identifying visual features of known objects, and surprising ones, such as avoiding contact with dangerous objects. We also found users scored the descriptions relatively low on average, 2.76 out of 5 (SD=1.49) for satisfaction and 2.43 out of 4 (SD=1.16) for trust, showing that descriptions still need significant improvements to deliver satisfying and trustworthy experiences. We discuss future opportunities for AI as it becomes a more powerful accessibility tool for BLV users.

Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People

TL;DR

Abstract

Paper Structure (37 sections, 17 figures, 6 tables)

This paper contains 37 sections, 17 figures, 6 tables.

Introduction
Related Work
Human-Powered Visual Interpretation Systems
AI-Powered Scene Description Systems
Image Description
Methods
Participants
Procedure
Scene Description Application
Qualitative Analysis
Quantitative Analysis
Findings
Overview of Use Cases
User Goals
Overview of User Goals.
...and 22 more sections

Figures (17)

Figure 1: The scene description application we used to collect data. Screenshots show the flow of using the application and submitting a diary entry. It includes five screens: the photo submission, the photo description, and three diary entry question screens (see \ref{['procedure']} for questions in questionnaire). The interface was designed to group similar questions, while minimizing the number of elements on each screen.
Figure 2: Two images exemplifying unique user goals. The image on the left, submitted by P11, shows a black dog's head laying on a white cat. P11 had wanted an interpretation to narrate the sentimental moment. The image on the right, submitted by P14, shows an empty chapel with wooden pews. P14 had wanted an interpretation to determine the privacy she would have in the space by checking for others' presence.
Figure 3: Examples of connected photos featuring the same subjects at varying distances; participants used the resulting interpretations to help them understand what was the best distance to obtain an accurate result. On the left, there are two photos of the same bedroom window submitted by P10. P10 wanted to learn how far away he needed to be from a photo subject for it to be fully captured and recognized by the application. On the right, there are two photos of a corded telephone submitted by P16, taken to “improve” her framing of the photo.
Figure 4: Normalized frequency of user goals observed in the diary entries. Each value represents how frequently user goals happened (same as Table \ref{['table: goal-normalized']}). Values are rounded up for simplified legibility.
Figure 5: A photo of an empty hotel bed, submitted by P5 to verify that his roommate was not sleeping in the bed. In this case, P5 took this photo because he did not want to inspect with his hands whether his roommate was present in the bed.
...and 12 more figures

Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People

TL;DR

Abstract

Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People

Authors

TL;DR

Abstract

Table of Contents

Figures (17)