RAVEN: Realtime Accessibility in Virtual ENvironments for Blind and Low-Vision People
Xinyun Cao, Kexin Phyllis Ju, Chenglin Li, Venkatesh Potluri, Dhruv Jain
TL;DR
RAVEN tackles the inequity of access to virtual 3D environments for blind and low-vision users by enabling real-time, natural-language querying and modification of scenes. The approach fuses semantic scene graphs, self-voicing feedback, and runtime code generation (via GROMIT) to implement user-directed accessibility changes in Unity scenes. Across eight BLV participants, the study demonstrates high usability and flexible interaction, but also reveals substantial challenges in accuracy, trust, and verification that must be mitigated through guardrails, automated metadata, and collaborative verification. The work shifts accessibility control from static, developer-defined presets to dynamic, user-driven adaptations, with broad implications for conversational programming and GenAI-assisted accessibility tools. Overall, RAVEN reveals both the promise and the practical hurdles of deploying GenAI-based accessibility in immersive environments, guiding future improvements in reliability, safety, and scalability.
Abstract
As virtual 3D environments become prevalent, equitable access is crucial for blind and low-vision (BLV) users who face challenges with spatial awareness, navigation, and interactions. To address this gap, previous work explored supplementing visual information with auditory and haptic modalities. However, these methods are static and offer limited support for dynamic, in-context adaptation. Recent work in generative AI enables users to query and modify 3D scenes via natural language, introducing a paradigm with increased flexibility and control for accessibility improvements. We present RAVEN, a system that responds to query or modification prompts from BLV users to improve the runtime accessibility of 3D virtual scenes. We evaluated the system with eight BLV people, uncovering key insights into the strengths and shortcomings of generative AI-driven accessibility in virtual 3D environments, pointing to promising results as well as challenges related to system reliability and user trust.
