ADCanvas: Accessible and Conversational Audio Description Authoring for Blind and Low Vision Creators
Franklin Mingzhe Li, Michael Xieyang Liu, Cynthia L. Bennett, Shaun K. Kane
TL;DR
ADCanvas reimagines audio description authoring for blind and low-vision creators by providing a screen-reader–friendly, non-visual workflow that combines a WebVTT editor, keyboard controls, and an instruction-based multimodal AI agent for live VQA and drafting. In a study with 12 BLV creators, the system enabled independent AD authoring, demonstrated as an information conduit, drafting assistant, and co-author, while revealing design needs around trust, verification, and fine-grained control. The work contributes empirical insights into human-AI co-creation in non-visual media, along with design implications for agent configurability, interaction modes, and accessibility-focused workflow integration. It highlights a path toward more autonomous, yet human-centered, AI-assisted AD tools that preserve professional standards and creative agency for BLV practitioners. As AI tools evolve, ADCanvas advocates for transparent, configurable, and co-creative interfaces that expand accessibility without undermining expert practice.
Abstract
Audio Description (AD) provides essential access to visual media for blind and low vision (BLV) audiences. Yet current AD production tools remain largely inaccessible to BLV video creators, who possess valuable expertise but face barriers due to visually-driven interfaces. We present ADCanvas, a multimodal authoring system that supports non-visual control over audio description (AD) creation. ADCanvas combines conversational interaction with keyboard-based playback control and a plain-text, screen reader-accessible editor to support end-to-end AD authoring and visual question answering (VQA). Combining screen-reader-friendly controls with a multimodal LLM agent, ADCanvas supports live VQA, script generation, and AD modification. Through a user study with 12 BLV video creators, we find that users adopt the conversational agent as an informational aide and drafting assistant, while maintaining agency through verification and editing. For example, participants saw themselves as curators who received information from the model and filtered it down for their audience. Our findings offer design implications for accessible media tools, including precise editing controls, accessibility support for creative ideation, and configurable rules for human-AI collaboration.
