Table of Contents
Fetching ...

XBIDetective: Leveraging Vision Language Models for Identifying Cross-Browser Visual Inconsistencies

Balreet Grewal, James Graham, Jeff Muizelaar, Jan Honza Odvarko, Suhaib Mujahid, Marco Castelluccio, Cor-Paul Bezemer

TL;DR

XBIDetective introduces a vision-language model–driven pipeline to detect cross-browser inconsistencies by comparing paired website renderings from Firefox and Chrome. The method explicitly flags advertisements and dynamic elements to avoid false positives before final XBI detection, with a fine-tuned VLM achieving up to 79% accuracy and high precision on a 1,052-bug dataset. Large-scale evaluation on 1,695 websites shows the approach can reveal XBIs while highlighting challenges from dynamic content and pop-ups, leading to practical use cases in automated regression testing and large-scale website monitoring. Overall, the work demonstrates that prompting VLMs to separate variable content from XBIs can yield effective, scalable cross-browser incompatibility detection for real-world browser development workflows.

Abstract

Browser rendering bugs can be challenging to detect for browser developers, as they may be triggered by very specific conditions that are exhibited on only a very small subset of websites. Cross-browser inconsistencies (XBIs), variations in how a website is interpreted and displayed on different browsers, can be helpful guides to detect such rendering bugs. Although visual and Document Object Model (DOM)-based analysis techniques exist for detecting XBIs, they often struggle with dynamic and interactive elements. In this study, we discuss our industry experience with using vision language models (VLMs) to identify XBIs. We present the XBIDetective tool which automatically captures screenshots of a website in Mozilla Firefox and Google Chrome, and analyzes them with a VLM for XBIs. We evaluate XBIDetective's performance with an off-the-shelf and a fine-tuned VLM on 1,052 websites. We show that XBIDetective can identify cross-browser discrepancies with 79% accuracy and detect dynamic elements and advertisements with 84% and 85% accuracy, respectively, when using the fine-tuned VLM. We discuss important lessons learned, and we present several potential practical use cases for XBIDetective, including automated regression testing, large-scale monitoring of websites, and rapid triaging of XBI bug reports.

XBIDetective: Leveraging Vision Language Models for Identifying Cross-Browser Visual Inconsistencies

TL;DR

XBIDetective introduces a vision-language model–driven pipeline to detect cross-browser inconsistencies by comparing paired website renderings from Firefox and Chrome. The method explicitly flags advertisements and dynamic elements to avoid false positives before final XBI detection, with a fine-tuned VLM achieving up to 79% accuracy and high precision on a 1,052-bug dataset. Large-scale evaluation on 1,695 websites shows the approach can reveal XBIs while highlighting challenges from dynamic content and pop-ups, leading to practical use cases in automated regression testing and large-scale website monitoring. Overall, the work demonstrates that prompting VLMs to separate variable content from XBIs can yield effective, scalable cross-browser incompatibility detection for real-world browser development workflows.

Abstract

Browser rendering bugs can be challenging to detect for browser developers, as they may be triggered by very specific conditions that are exhibited on only a very small subset of websites. Cross-browser inconsistencies (XBIs), variations in how a website is interpreted and displayed on different browsers, can be helpful guides to detect such rendering bugs. Although visual and Document Object Model (DOM)-based analysis techniques exist for detecting XBIs, they often struggle with dynamic and interactive elements. In this study, we discuss our industry experience with using vision language models (VLMs) to identify XBIs. We present the XBIDetective tool which automatically captures screenshots of a website in Mozilla Firefox and Google Chrome, and analyzes them with a VLM for XBIs. We evaluate XBIDetective's performance with an off-the-shelf and a fine-tuned VLM on 1,052 websites. We show that XBIDetective can identify cross-browser discrepancies with 79% accuracy and detect dynamic elements and advertisements with 84% and 85% accuracy, respectively, when using the fine-tuned VLM. We discuss important lessons learned, and we present several potential practical use cases for XBIDetective, including automated regression testing, large-scale monitoring of websites, and rapid triaging of XBI bug reports.

Paper Structure

This paper contains 24 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Overview of XBIDetective.
  • Figure 2: Example of overlay process with two screenshots taken of a website (https://www.crave.ca/en) with a dynamically changing carousel. Note that only 2 of the 5 screenshots used for the overlay are shown for brevity.
  • Figure 3: Overview of experimental setup.
  • Figure 4: Confusion matrix of XBIDetectivebase's labelling of the impact score.
  • Figure 5: Confusion matrix of XBIDetectivethinking's labelling of the impact score.
  • ...and 2 more figures