Table of Contents
Fetching ...

PURL: Safe and Effective Sanitization of Link Decoration

Shaoor Munir, Patrick Lee, Umar Iqbal, Zubair Shafiq, Sandra Siby

TL;DR

PURL is presented, a machine-learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration and significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage while being robust to common evasion techniques.

Abstract

While privacy-focused browsers have taken steps to block third-party cookies and mitigate browser fingerprinting, novel tracking techniques that can bypass existing countermeasures continue to emerge. Since trackers need to share information from the client-side to the server-side through link decoration regardless of the tracking technique they employ, a promising orthogonal approach is to detect and sanitize tracking information in decorated links. To this end, we present PURL (pronounced purel-l), a machine-learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration. Our evaluation shows that PURL significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage while being robust to common evasion techniques. PURL's deployment on a sample of top-million websites shows that link decoration is abused for tracking on nearly three-quarters of the websites, often to share cookies, email addresses, and fingerprinting information.

PURL: Safe and Effective Sanitization of Link Decoration

TL;DR

PURL is presented, a machine-learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration and significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage while being robust to common evasion techniques.

Abstract

While privacy-focused browsers have taken steps to block third-party cookies and mitigate browser fingerprinting, novel tracking techniques that can bypass existing countermeasures continue to emerge. Since trackers need to share information from the client-side to the server-side through link decoration regardless of the tracking technique they employ, a promising orthogonal approach is to detect and sanitize tracking information in decorated links. To this end, we present PURL (pronounced purel-l), a machine-learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration. Our evaluation shows that PURL significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage while being robust to common evasion techniques. PURL's deployment on a sample of top-million websites shows that link decoration is abused for tracking on nearly three-quarters of the websites, often to share cookies, email addresses, and fingerprinting information.
Paper Structure (28 sections, 10 figures, 8 tables)

This paper contains 28 sections, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Percentage of sites where the same link decoration by top domain appears and their primary usage. The shades of red and green show link decoration's usage as ATS and Non-ATS, respectively.
  • Figure 2: Total unique link decorations used by domains. The shades of red and green show link decoration's usage as ATS and Non-ATS, respectively.
  • Figure 3: Average number of link decorations used by Google endpoints (minimum 1000 requests across 20K sites)
  • Figure 4: Example URL with mixed link decorations. indicates a Non-ATS link decoration while indicates an ATS link decoration. Resource paths are highlighted as green and the query parameters with keys source, ven, Tac, school, and Matchtype are used for functional purposes, while gclid contains an identifier that is used to track ad clicks.
  • Figure 5: Overview of Purl pipeline: (1) Webpage crawl using an instrumented browser; (2) Construction of a graph representation to represent the instrumented webpage execution information; (3) Feature extraction for graph nodes that represent link decorations; and (4) Classifier training to separate out ATS and Non-ATS link decorations.
  • ...and 5 more figures