Table of Contents
Fetching ...

Towards Scalable Web Accessibility Audit with MLLMs as Copilots

Ming Gu, Ziwei Wang, Sicen Lai, Zirui Gao, Sheng Zhou, Jiajun Bu

TL;DR

The paper tackles the scalability bottleneck of web accessibility audits by proposing AAA, a full-lifecycle framework that operationalizes WCAG-EM through Automation, AI, and Auditor. It introduces GRASP, a graph-based multimodal sampling method that leverages textual, visual, and relational page representations to produce representative site subsets, and MaC, a multimodal LLM-powered copilot that assists across sampling and manual evaluation tasks. Four benchmark datasets (TPS, APR, CCT, CPE) are released to evaluate the WAA pipeline and ML-assisted components. Experiments show that small, fine-tuned MLLMs can approach the capabilities of larger models and that GRASP with a heterophilic GNN (IGNN) yields superior sampling quality, indicating practical viability for scalable accessibility auditing. The work lays groundwork for scalable, AI-assisted audits and could influence future standards and tooling by providing open benchmarks and end-to-end methodologies.

Abstract

Ensuring web accessibility is crucial for advancing social welfare, justice, and equality in digital spaces, yet the vast majority of website user interfaces remain non-compliant, due in part to the resource-intensive and unscalable nature of current auditing practices. While WCAG-EM offers a structured methodology for site-wise conformance evaluation, it involves great human efforts and lacks practical support for execution at scale. In this work, we present an auditing framework, AAA, which operationalizes WCAG-EM through a human-AI partnership model. AAA is anchored by two key innovations: GRASP, a graph-based multimodal sampling method that ensures representative page coverage via learned embeddings of visual, textual, and relational cues; and MaC, a multimodal large language model-based copilot that supports auditors through cross-modal reasoning and intelligent assistance in high-effort tasks. Together, these components enable scalable, end-to-end web accessibility auditing, empowering human auditors with AI-enhanced assistance for real-world impact. We further contribute four novel datasets designed for benchmarking core stages of the audit pipeline. Extensive experiments demonstrate the effectiveness of our methods, providing insights that small-scale language models can serve as capable experts when fine-tuned.

Towards Scalable Web Accessibility Audit with MLLMs as Copilots

TL;DR

The paper tackles the scalability bottleneck of web accessibility audits by proposing AAA, a full-lifecycle framework that operationalizes WCAG-EM through Automation, AI, and Auditor. It introduces GRASP, a graph-based multimodal sampling method that leverages textual, visual, and relational page representations to produce representative site subsets, and MaC, a multimodal LLM-powered copilot that assists across sampling and manual evaluation tasks. Four benchmark datasets (TPS, APR, CCT, CPE) are released to evaluate the WAA pipeline and ML-assisted components. Experiments show that small, fine-tuned MLLMs can approach the capabilities of larger models and that GRASP with a heterophilic GNN (IGNN) yields superior sampling quality, indicating practical viability for scalable accessibility auditing. The work lays groundwork for scalable, AI-assisted audits and could influence future standards and tooling by providing open benchmarks and end-to-end methodologies.

Abstract

Ensuring web accessibility is crucial for advancing social welfare, justice, and equality in digital spaces, yet the vast majority of website user interfaces remain non-compliant, due in part to the resource-intensive and unscalable nature of current auditing practices. While WCAG-EM offers a structured methodology for site-wise conformance evaluation, it involves great human efforts and lacks practical support for execution at scale. In this work, we present an auditing framework, AAA, which operationalizes WCAG-EM through a human-AI partnership model. AAA is anchored by two key innovations: GRASP, a graph-based multimodal sampling method that ensures representative page coverage via learned embeddings of visual, textual, and relational cues; and MaC, a multimodal large language model-based copilot that supports auditors through cross-modal reasoning and intelligent assistance in high-effort tasks. Together, these components enable scalable, end-to-end web accessibility auditing, empowering human auditors with AI-enhanced assistance for real-world impact. We further contribute four novel datasets designed for benchmarking core stages of the audit pipeline. Extensive experiments demonstrate the effectiveness of our methods, providing insights that small-scale language models can serve as capable experts when fine-tuned.

Paper Structure

This paper contains 18 sections, 5 equations, 2 figures, 15 tables.

Figures (2)

  • Figure 1: Overview of AAA.
  • Figure 2: Overview of GRASP.