Table of Contents
Fetching ...

Early Accessibility: Automating Alt-Text Generation for UI Icons During App Development

Sabrina Haque, Christoph Csallner

TL;DR

This work addresses the challenge of missing or low-quality alt-text for mobile UI icons during app development by introducing AltIcon, an IDE-integrated approach that generates context-aware alt-text using two fine-tuned models: a text-only model (AltIcon-TextT) and a multimodal model (AltIcon-MMT). AltIcon leverages DOM context, in-icon OCR text, and structured prompts to produce descriptive alt-text without requiring a full screen, reducing downstream technical debt. Empirical evaluation on the WC20-derived icon dataset shows AltIcon outperforms state-of-the-art post-development approaches and zero-shot GPT-4o baselines, with AltIcon-MMT delivering the strongest human-evaluated quality, especially on partial screens. The work demonstrates the value of shift-left accessibility in development workflows and provides publicly available data, code, and replication materials for broader adoption and extension to other platforms.

Abstract

Alt-text is essential for mobile app accessibility, yet UI icons often lack meaningful descriptions, limiting accessibility for screen reader users. Existing approaches either require extensive labeled datasets, struggle with partial UI contexts, or operate post-development, increasing technical debt. We first conduct a formative study to determine when and how developers prefer to generate icon alt-text. We then explore the ALTICON approach for generating alt-text for UI icons during development using two fine-tuned models: a text-only large language model that processes extracted UI metadata and a multi-modal model that jointly analyzes icon images and textual context. To improve accuracy, the method extracts relevant UI information from the DOM tree, retrieves in-icon text via OCR, and applies structured prompts for alt-text generation. Our empirical evaluation with the most closely related deep-learning and vision-language models shows that ALTICON generates alt-text that is of higher quality while not requiring a full-screen input.

Early Accessibility: Automating Alt-Text Generation for UI Icons During App Development

TL;DR

This work addresses the challenge of missing or low-quality alt-text for mobile UI icons during app development by introducing AltIcon, an IDE-integrated approach that generates context-aware alt-text using two fine-tuned models: a text-only model (AltIcon-TextT) and a multimodal model (AltIcon-MMT). AltIcon leverages DOM context, in-icon OCR text, and structured prompts to produce descriptive alt-text without requiring a full screen, reducing downstream technical debt. Empirical evaluation on the WC20-derived icon dataset shows AltIcon outperforms state-of-the-art post-development approaches and zero-shot GPT-4o baselines, with AltIcon-MMT delivering the strongest human-evaluated quality, especially on partial screens. The work demonstrates the value of shift-left accessibility in development workflows and provides publicly available data, code, and replication materials for broader adoption and extension to other platforms.

Abstract

Alt-text is essential for mobile app accessibility, yet UI icons often lack meaningful descriptions, limiting accessibility for screen reader users. Existing approaches either require extensive labeled datasets, struggle with partial UI contexts, or operate post-development, increasing technical debt. We first conduct a formative study to determine when and how developers prefer to generate icon alt-text. We then explore the ALTICON approach for generating alt-text for UI icons during development using two fine-tuned models: a text-only large language model that processes extracted UI metadata and a multi-modal model that jointly analyzes icon images and textual context. To improve accuracy, the method extracts relevant UI information from the DOM tree, retrieves in-icon text via OCR, and applies structured prompts for alt-text generation. Our empirical evaluation with the most closely related deep-learning and vision-language models shows that ALTICON generates alt-text that is of higher quality while not requiring a full-screen input.

Paper Structure

This paper contains 29 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Example zoom out (left) vs. lower volume (right) minus buttons in Rico.
  • Figure 2: On adding an icon, AltIcon extracts icon (I), parent (P), sibling (S), and activity (A) DOM tree info and passes it with icon image-extracted info to one of two fine-tuned models; ground truth: "go back 15 seconds", "rewind 15 seconds"; PaliGemma: "refresh"; Pix2Struct: "toggle autoplay"; AltIcon-TextT: "go back"; AltIcon-MMT: "go back 15 seconds".
  • Figure 3: In-icon text examples from Rico.
  • Figure 4: Sample double-anonymous user survey screens (RQ4): target icon (red box), icon's parent (green box, not marked in survey or tool input), app name: ground truth (ref), PaliGemma-448c (PG), AltIcon-TextT & AltIcon-MMTi (AltIcon).
  • Figure 5: Screens where AltIcon infers sub-optimal alt-text; target icon (red box), icon's parent (green box, not marked for tools), app name: ground truth (ref), PaliGemma-448c (PG), AltIcon-TextT (TextT), AltIcon-MMTi (MMT), TextT & MMT (AltIcon).