ScreenAudit: Detecting Screen Reader Accessibility Errors in Mobile Apps Using Large Language Models
Mingyuan Zhong, Ruolin Chen, Xia Chen, James Fogarty, Jacob O. Wobbrock
TL;DR
This work introduces Screen-Au-dit, an LLM-powered accessibility auditor that traverses Android app screens using a TalkBack-enabled recorder, captures transcripts and UI data, and leverages GPT-4o to identify screen reader accessibility errors beyond traditional rule-based checkers. In expert evaluations across 14 screens, Screen-Au-dit achieved 69.2% coverage with 71.3% precision, significantly outperforming baseline tools. The study also analyzes prompting strategies, showing that general accessibility guidance plus contextual prompts yields best recall, and demonstrates that Screen-Au-dit can complement, not replace, existing checkers. The findings suggest a practical path toward faster, more expressive accessibility feedback and motivate future work on broader UI element coverage, richer context, simulated user testing, and potential code-level integration.
Abstract
Many mobile apps are inaccessible, thereby excluding people from their potential benefits. Existing rule-based accessibility checkers aim to mitigate these failures by identifying errors early during development but are constrained in the types of errors they can detect. We present ScreenAudit, an LLM-powered system designed to traverse mobile app screens, extract metadata and transcripts, and identify screen reader accessibility errors overlooked by existing checkers. We recruited six accessibility experts including one screen reader user to evaluate ScreenAudit's reports across 14 unique app screens. Our findings indicate that ScreenAudit achieves an average coverage of 69.2%, compared to only 31.3% with a widely-used accessibility checker. Expert feedback indicated that ScreenAudit delivered higher-quality feedback and addressed more aspects of screen reader accessibility compared to existing checkers, and that ScreenAudit would benefit app developers in real-world settings.
