Table of Contents
Fetching ...

Assessing Web Fingerprinting Risk

Enrico Bacis, Igor Bilogrevic, Robert Busa-Fekete, Asanka Herath, Antonio Sartori, Umar Syed

TL;DR

This study is based on actual visited pages and Web APIs reported by tens of millions of real Chrome browsers in-the-wild, and accounted for the dependencies and correlations among Web APIs, which is crucial for obtaining more realistic entropy estimates.

Abstract

Modern Web APIs allow developers to provide extensively customized experiences for website visitors, but the richness of the device information they provide also make them vulnerable to being abused to construct browser fingerprints, device-specific identifiers that enable covert tracking of users even when cookies are disabled. Previous research has established entropy, a measure of information, as the key metric for quantifying fingerprinting risk. However, earlier studies had two major limitations. First, their entropy estimates were based on either a single website or a very small sample of devices. Second, they did not adequately consider correlations among different Web APIs, potentially grossly overestimating their fingerprinting risk. We provide the first study of browser fingerprinting which addresses the limitations of prior work. Our study is based on actual visited pages and Web APIs reported by tens of millions of real Chrome browsers in-the-wild. We accounted for the dependencies and correlations among Web APIs, which is crucial for obtaining more realistic entropy estimates. We also developed a novel experimental design that accurately and efficiently estimates entropy while never observing too much information from any single user. Our results provide an understanding of the distribution of entropy for different website categories, confirm the utility of entropy as a fingerprinting proxy, and offer a method for evaluating browser enhancements which are intended to mitigate fingerprinting.

Assessing Web Fingerprinting Risk

TL;DR

This study is based on actual visited pages and Web APIs reported by tens of millions of real Chrome browsers in-the-wild, and accounted for the dependencies and correlations among Web APIs, which is crucial for obtaining more realistic entropy estimates.

Abstract

Modern Web APIs allow developers to provide extensively customized experiences for website visitors, but the richness of the device information they provide also make them vulnerable to being abused to construct browser fingerprints, device-specific identifiers that enable covert tracking of users even when cookies are disabled. Previous research has established entropy, a measure of information, as the key metric for quantifying fingerprinting risk. However, earlier studies had two major limitations. First, their entropy estimates were based on either a single website or a very small sample of devices. Second, they did not adequately consider correlations among different Web APIs, potentially grossly overestimating their fingerprinting risk. We provide the first study of browser fingerprinting which addresses the limitations of prior work. Our study is based on actual visited pages and Web APIs reported by tens of millions of real Chrome browsers in-the-wild. We accounted for the dependencies and correlations among Web APIs, which is crucial for obtaining more realistic entropy estimates. We also developed a novel experimental design that accurately and efficiently estimates entropy while never observing too much information from any single user. Our results provide an understanding of the distribution of entropy for different website categories, confirm the utility of entropy as a fingerprinting proxy, and offer a method for evaluating browser enhancements which are intended to mitigate fingerprinting.
Paper Structure (24 sections, 5 theorems, 21 equations, 10 figures, 1 algorithm)

This paper contains 24 sections, 5 theorems, 21 equations, 10 figures, 1 algorithm.

Key Result

Theorem 1

Using the notation above, for any $\epsilon > \frac{2(k-1)}{n}$ and $n \in \mathbb{N}_+$, it holds that and for any $\epsilon > 0$ and $n \in \mathbb{N}_+$, it holds that

Figures (10)

  • Figure 1: Two possible value distributions for a Web API, (a) low entropy and (b) high entropy.
  • Figure 2: Surface call frequency for several website verticals
  • Figure 3: Surface clustering determined by pairwise correlations. Clusters are outlined by colored rectangles.
  • Figure 4: Surface clustering determined by mutual information. Clusters are indicated by ovals.
  • Figure 5: Entropy distribution of the web
  • ...and 5 more figures

Theorems & Definitions (7)

  • Theorem 1
  • proof
  • Theorem 2: Theorem 1.2 of Agrawal20
  • Corollary 1
  • Theorem 3
  • Theorem 4
  • proof