Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting

Wolf Rieder; Philip Raschke; Thomas Cory

Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting

Wolf Rieder, Philip Raschke, Thomas Cory

TL;DR

The paper explores using HTTP response headers for cross-browser web tracker classification in imbalanced settings, introducing a semi-automated ML pipeline that binarizes header presence and trains multiple ensemble classifiers on Chrome-derived data. It demonstrates high performance for Chrome and Firefox, with weaker results on Brave due to distributional differences, and shows that response headers outperform HTTP request headers for this task. The study provides a thorough cross-browser and longitudinal evaluation, highlighting both the promise and limitations of deployable, header-based tracker detection and offering guidance for integrating such signals into dynamic filter lists or reinforcement learning-based systems. These findings have practical implications for privacy tooling and the design of robust, scalable tracker detectors in real-world web environments.

Abstract

The World Wide Web's connectivity is greatly attributed to the HTTP protocol, with HTTP messages offering informative header fields that appeal to disciplines like web security and privacy, especially concerning web tracking. Despite existing research employing HTTP request messages to identify web trackers, HTTP response headers are often overlooked. This study endeavors to design effective machine learning classifiers for web tracker detection using binarized HTTP response headers. Data from the Chrome, Firefox, and Brave browsers, obtained through the traffic monitoring browser extension T.EX, serves as our dataset. Ten supervised models were trained on Chrome data and tested across all browsers, including a Chrome dataset from a year later. The results demonstrated high accuracy, F1-score, precision, recall, and minimal log-loss error for Chrome and Firefox, but subpar performance on Brave, potentially due to its distinct data distribution and feature set. The research suggests that these classifiers are viable for web tracker detection. However, real-world application testing remains pending, and the distinction between tracker types and broader label sources could be explored in future studies.

Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting

TL;DR

Abstract

Paper Structure (33 sections, 3 equations, 10 figures, 9 tables)

This paper contains 33 sections, 3 equations, 10 figures, 9 tables.

Introduction
Related Work
Classification Based on HTTP Information
Other Detection Approaches
Summary
Approach
Preliminaries
Research Questions
Data Collection
Initial Response Header Analysis
Descriptive Analysis of Headers
Methodology
Problem Formulation
Machine Learning Pipeline
Classification Models
...and 18 more sections

Figures (10)

Figure 1: Frequency distribution of headers for each browser, indicating the frequency with which each header is present in the dataset.
Figure 2: Venn diagram of HTTP response headers in the examined datasets, revealing a core set of 5672 headers that are common across Chrome, Firefox, and Brave. Unique headers are present in each browser, although $\text{Chrome}_{23}$ has the highest number of unique headers, which may reflect browser-specific features or experimental headers.
Figure 3: Row (A) presents the ECDFs for both Chrome datasets for the Content-Length header values. We set a cut-off value at 10,000 to highlight the core observation and more than half of the responses have a value below this threshold -- $\tilde{\mu}_{T22} = 62$, $\tilde{\mu}_{NT22} = 8068$ and $\tilde{\mu}_{T23} = 45$, $\tilde{\mu}_{NT23} = 8021.5$. Row (B) shows how similar the values across $\text{Chrome}_{22}$, $\text{Firefox}_{22}$, and $\text{Brave}_{22}$ are for the X-XSS-Protection header.
Figure 4: The parameterized pipeline automatically processes the collected datasets and trains the classifiers.
Figure 5: t-SNE plots using a representative sample across all browsers to gauge data similarity. Separate clusters exist for trackers and non-trackers, but there is significant overlap between both classes. The results are separated by class to highlight differences between trackers and non-trackers.
...and 5 more figures

Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting

TL;DR

Abstract

Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting

Authors

TL;DR

Abstract

Table of Contents

Figures (10)