SINBAD: Saliency-informed detection of breakage caused by ad blocking
Saiid El Hajj Chehade, Sandra Siby, Carmela Troncoso
TL;DR
SINBAD tackles breakage caused by privacy-preserving filter lists by training on user-reported issues and leveraging web saliency to drive targeted interactions. The method combines three innovations—forum-derived ground truth, saliency-informed crawling, and subtree-focused differential analysis—to detect breakage, including dynamic and CSS-based cases, with a reported $20\%$ accuracy improvement over prior work. It demonstrates high discrimination at the subtree level and scalable evaluation across multiple datasets, and shows strong generalization in open-world tests. The practical impact is a proactive tool for maintainers to test new rules before deployment, reducing user-friction and improving the reliability of blocking tools.
Abstract
Privacy-enhancing blocking tools based on filter-list rules tend to break legitimate functionality. Filter-list maintainers could benefit from automated breakage detection tools that allow them to proactively fix problematic rules before deploying them to millions of users. We introduce SINBAD, an automated breakage detector that improves the accuracy over the state of the art by 20%, and is the first to detect dynamic breakage and breakage caused by style-oriented filter rules. The success of SINBAD is rooted in three innovations: (1) the use of user-reported breakage issues in forums that enable the creation of a high-quality dataset for training in which only breakage that users perceive as an issue is included; (2) the use of 'web saliency' to automatically identify user-relevant regions of a website on which to prioritize automated interactions aimed at triggering breakage; and (3) the analysis of webpages via subtrees which enables fine-grained identification of problematic filter rules.
