Table of Contents
Fetching ...

RAVEN: Realtime Accessibility in Virtual ENvironments for Blind and Low-Vision People

Xinyun Cao, Kexin Phyllis Ju, Chenglin Li, Venkatesh Potluri, Dhruv Jain

TL;DR

RAVEN tackles the inequity of access to virtual 3D environments for blind and low-vision users by enabling real-time, natural-language querying and modification of scenes. The approach fuses semantic scene graphs, self-voicing feedback, and runtime code generation (via GROMIT) to implement user-directed accessibility changes in Unity scenes. Across eight BLV participants, the study demonstrates high usability and flexible interaction, but also reveals substantial challenges in accuracy, trust, and verification that must be mitigated through guardrails, automated metadata, and collaborative verification. The work shifts accessibility control from static, developer-defined presets to dynamic, user-driven adaptations, with broad implications for conversational programming and GenAI-assisted accessibility tools. Overall, RAVEN reveals both the promise and the practical hurdles of deploying GenAI-based accessibility in immersive environments, guiding future improvements in reliability, safety, and scalability.

Abstract

As virtual 3D environments become prevalent, equitable access is crucial for blind and low-vision (BLV) users who face challenges with spatial awareness, navigation, and interactions. To address this gap, previous work explored supplementing visual information with auditory and haptic modalities. However, these methods are static and offer limited support for dynamic, in-context adaptation. Recent work in generative AI enables users to query and modify 3D scenes via natural language, introducing a paradigm with increased flexibility and control for accessibility improvements. We present RAVEN, a system that responds to query or modification prompts from BLV users to improve the runtime accessibility of 3D virtual scenes. We evaluated the system with eight BLV people, uncovering key insights into the strengths and shortcomings of generative AI-driven accessibility in virtual 3D environments, pointing to promising results as well as challenges related to system reliability and user trust.

RAVEN: Realtime Accessibility in Virtual ENvironments for Blind and Low-Vision People

TL;DR

RAVEN tackles the inequity of access to virtual 3D environments for blind and low-vision users by enabling real-time, natural-language querying and modification of scenes. The approach fuses semantic scene graphs, self-voicing feedback, and runtime code generation (via GROMIT) to implement user-directed accessibility changes in Unity scenes. Across eight BLV participants, the study demonstrates high usability and flexible interaction, but also reveals substantial challenges in accuracy, trust, and verification that must be mitigated through guardrails, automated metadata, and collaborative verification. The work shifts accessibility control from static, developer-defined presets to dynamic, user-driven adaptations, with broad implications for conversational programming and GenAI-assisted accessibility tools. Overall, RAVEN reveals both the promise and the practical hurdles of deploying GenAI-based accessibility in immersive environments, guiding future improvements in reliability, safety, and scalability.

Abstract

As virtual 3D environments become prevalent, equitable access is crucial for blind and low-vision (BLV) users who face challenges with spatial awareness, navigation, and interactions. To address this gap, previous work explored supplementing visual information with auditory and haptic modalities. However, these methods are static and offer limited support for dynamic, in-context adaptation. Recent work in generative AI enables users to query and modify 3D scenes via natural language, introducing a paradigm with increased flexibility and control for accessibility improvements. We present RAVEN, a system that responds to query or modification prompts from BLV users to improve the runtime accessibility of 3D virtual scenes. We evaluated the system with eight BLV people, uncovering key insights into the strengths and shortcomings of generative AI-driven accessibility in virtual 3D environments, pointing to promising results as well as challenges related to system reliability and user trust.

Paper Structure

This paper contains 69 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Examples of system usage. The center image shows the original game scene, with six surrounding panels illustrating accessibility-driven modifications: (A) change object color, (B) reposition the player, (C) enlarge a text object, (D) increase brightness, (E) amplify audio volume, and (F) adjust audio pitch. Bubbles and icons in E and F visualize auditory changes. These are only a few of the many types of modifications RAVEN can support to enable flexible, user-driven accessibility.
  • Figure 2: Screenshots of the three scenes used in our evaluation. Scene 1: a demo with simple objects and sound sources. Scene 2: a park with cats meowing and background nature sounds. Scene 3: a spaceship room with furniture, small items, and sci-fi sound sources.
  • Figure 3: Prompt usage and correctness across categories.