Table of Contents
Fetching ...

Duumviri: Detecting Trackers and Mixed Trackers with a Breakage Detector

He Shuang, Lianying Zhao, David Lie

TL;DR

Duumviri tackles the privacy risk from web trackers by integrating breakage detection into tracker detection and using differential features derived from rendering traces, enabling more precise blocking of non-mixed trackers and partial blocking of mixed trackers. It employs two classifiers—one for trackers and one for breakage—trained on carefully constructed, differential features; the breakage detector reduces misclassification errors that lead to page breakage. In evaluations on 15K pages for non-mixed trackers, it achieves an adjusted accuracy of $97.44\%$, and for mixed trackers, a conservative lower-bound of $74.19\%$, while uncovering 22 new trackers and 26 mixed trackers. The work demonstrates practical utility by producing actionable rules, reporting findings to the community, and releasing artifacts to enable researchers and developers to accelerate privacy-preserving block-list improvements.

Abstract

Web tracking harms user privacy. As a result, the use of tracker detection and blocking tools is a common practice among Internet users. However, no such tool can be perfect, and thus there is a trade-off between avoiding breakage (caused by unintentionally blocking some required functionality) and neglecting to block some trackers. State-of-the-art tools usually rely on user reports and developer effort to detect breakages, which can be broadly categorized into two causes: 1) misidentifying non-trackers as trackers, and 2) blocking mixed trackers which blend tracking with functional components. We propose incorporating a machine learning-based breakage detector into the tracker detection pipeline to automatically avoid misidentification of functional resources. For both tracker detection and breakage detection, we propose using differential features that can more clearly elucidate the differences caused by blocking a request. We designed and implemented a prototype of our proposed approach, Duumviri, for non-mixed trackers. We then adopt it to automatically identify mixed trackers, drawing differential features at partial-request granularity. In the case of non-mixed trackers, evaluating Duumviri on 15K pages shows its ability to replicate the labels of human-generated filter lists, EasyPrivacy, with an accuracy of 97.44%. Through a manual analysis, we find that Duumviri can identify previously unreported trackers and its breakage detector can identify overly strict EasyPrivacy rules that cause breakage. In the case of mixed trackers, Duumviri is the first automated mixed tracker detector, and achieves a lower bound accuracy of 74.19%. Duumviri has enabled us to detect and confirm 22 previously unreported unique trackers and 26 unique mixed trackers.

Duumviri: Detecting Trackers and Mixed Trackers with a Breakage Detector

TL;DR

Duumviri tackles the privacy risk from web trackers by integrating breakage detection into tracker detection and using differential features derived from rendering traces, enabling more precise blocking of non-mixed trackers and partial blocking of mixed trackers. It employs two classifiers—one for trackers and one for breakage—trained on carefully constructed, differential features; the breakage detector reduces misclassification errors that lead to page breakage. In evaluations on 15K pages for non-mixed trackers, it achieves an adjusted accuracy of , and for mixed trackers, a conservative lower-bound of , while uncovering 22 new trackers and 26 mixed trackers. The work demonstrates practical utility by producing actionable rules, reporting findings to the community, and releasing artifacts to enable researchers and developers to accelerate privacy-preserving block-list improvements.

Abstract

Web tracking harms user privacy. As a result, the use of tracker detection and blocking tools is a common practice among Internet users. However, no such tool can be perfect, and thus there is a trade-off between avoiding breakage (caused by unintentionally blocking some required functionality) and neglecting to block some trackers. State-of-the-art tools usually rely on user reports and developer effort to detect breakages, which can be broadly categorized into two causes: 1) misidentifying non-trackers as trackers, and 2) blocking mixed trackers which blend tracking with functional components. We propose incorporating a machine learning-based breakage detector into the tracker detection pipeline to automatically avoid misidentification of functional resources. For both tracker detection and breakage detection, we propose using differential features that can more clearly elucidate the differences caused by blocking a request. We designed and implemented a prototype of our proposed approach, Duumviri, for non-mixed trackers. We then adopt it to automatically identify mixed trackers, drawing differential features at partial-request granularity. In the case of non-mixed trackers, evaluating Duumviri on 15K pages shows its ability to replicate the labels of human-generated filter lists, EasyPrivacy, with an accuracy of 97.44%. Through a manual analysis, we find that Duumviri can identify previously unreported trackers and its breakage detector can identify overly strict EasyPrivacy rules that cause breakage. In the case of mixed trackers, Duumviri is the first automated mixed tracker detector, and achieves a lower bound accuracy of 74.19%. Duumviri has enabled us to detect and confirm 22 previously unreported unique trackers and 26 unique mixed trackers.
Paper Structure (47 sections, 5 figures, 15 tables)

This paper contains 47 sections, 5 figures, 15 tables.

Figures (5)

  • Figure 1: An example of an exception rule used to 'fix' page breakage. When 'fender.com' fetches 'gretel.min.js' from 'cdn.cquotient.com'. This request is blocked as the domain is listed as a tracking server. However, the particular resource is used for legitimate web page functionality (product recommendation); blocking it causes missing page content. Privacy developers fix this issue by adding an exception rule that makes an exception for 'fender.com' github_commit_fixing.
  • Figure 2: 1) Duumviri visits a page using two instrumented browser instances capable of produce a rendering trace. Both instances share a network cache. 2) Duumviri conducts differential analysis on the page instances and draws differential features independently for its detectors. 3) the detectors take the features and make predictions. The predictions determine if the potential tracker is added to the filter list.
  • Figure 3: Duumviri discovered EasyPrivacy-caused site breakage on 'ero-advertising.com'.
  • Figure 4: Duumviri discovered mixed tracker on 'myblogguest.com'.
  • Figure 5: A Duumviri identified tracker on 'focus.de' with file-level header comment that leads to external documentation.