Early Accessibility: Automating Alt-Text Generation for UI Icons During App Development
Sabrina Haque, Christoph Csallner
TL;DR
This work addresses the challenge of missing or low-quality alt-text for mobile UI icons during app development by introducing AltIcon, an IDE-integrated approach that generates context-aware alt-text using two fine-tuned models: a text-only model (AltIcon-TextT) and a multimodal model (AltIcon-MMT). AltIcon leverages DOM context, in-icon OCR text, and structured prompts to produce descriptive alt-text without requiring a full screen, reducing downstream technical debt. Empirical evaluation on the WC20-derived icon dataset shows AltIcon outperforms state-of-the-art post-development approaches and zero-shot GPT-4o baselines, with AltIcon-MMT delivering the strongest human-evaluated quality, especially on partial screens. The work demonstrates the value of shift-left accessibility in development workflows and provides publicly available data, code, and replication materials for broader adoption and extension to other platforms.
Abstract
Alt-text is essential for mobile app accessibility, yet UI icons often lack meaningful descriptions, limiting accessibility for screen reader users. Existing approaches either require extensive labeled datasets, struggle with partial UI contexts, or operate post-development, increasing technical debt. We first conduct a formative study to determine when and how developers prefer to generate icon alt-text. We then explore the ALTICON approach for generating alt-text for UI icons during development using two fine-tuned models: a text-only large language model that processes extracted UI metadata and a multi-modal model that jointly analyzes icon images and textual context. To improve accuracy, the method extracts relevant UI information from the DOM tree, retrieves in-icon text via OCR, and applies structured prompts for alt-text generation. Our empirical evaluation with the most closely related deep-learning and vision-language models shows that ALTICON generates alt-text that is of higher quality while not requiring a full-screen input.
