Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting
Wolf Rieder, Philip Raschke, Thomas Cory
TL;DR
The paper explores using HTTP response headers for cross-browser web tracker classification in imbalanced settings, introducing a semi-automated ML pipeline that binarizes header presence and trains multiple ensemble classifiers on Chrome-derived data. It demonstrates high performance for Chrome and Firefox, with weaker results on Brave due to distributional differences, and shows that response headers outperform HTTP request headers for this task. The study provides a thorough cross-browser and longitudinal evaluation, highlighting both the promise and limitations of deployable, header-based tracker detection and offering guidance for integrating such signals into dynamic filter lists or reinforcement learning-based systems. These findings have practical implications for privacy tooling and the design of robust, scalable tracker detectors in real-world web environments.
Abstract
The World Wide Web's connectivity is greatly attributed to the HTTP protocol, with HTTP messages offering informative header fields that appeal to disciplines like web security and privacy, especially concerning web tracking. Despite existing research employing HTTP request messages to identify web trackers, HTTP response headers are often overlooked. This study endeavors to design effective machine learning classifiers for web tracker detection using binarized HTTP response headers. Data from the Chrome, Firefox, and Brave browsers, obtained through the traffic monitoring browser extension T.EX, serves as our dataset. Ten supervised models were trained on Chrome data and tested across all browsers, including a Chrome dataset from a year later. The results demonstrated high accuracy, F1-score, precision, recall, and minimal log-loss error for Chrome and Firefox, but subpar performance on Brave, potentially due to its distinct data distribution and feature set. The research suggests that these classifiers are viable for web tracker detection. However, real-world application testing remains pending, and the distinction between tracker types and broader label sources could be explored in future studies.
