MAC Address De-Randomization Using Multi-Channel Sniffers and Two-Stage Clustering
Giovanni Baccichet, Corrado Innamorati, Alessandro E. C. Redondi, Matteo Cesana
TL;DR
The paper tackles privacy risks from MAC randomization in Wi‑Fi Probe Requests by proposing a two‑stage clustering framework that combines IE fingerprinting with a time‑frequency multi‑channel signature. It introduces a larger, more challenging dataset of devices with identical characteristics and demonstrates that multi‑channel burst patterns improve device discrimination beyond IE‑based methods. The approach yields higher clustering quality (homogeneity, completeness, V‑measure) and more accurate device counting in identical‑device scenarios, validating the method on two datasets and releasing the data for reproducible research. These findings have practical implications for traffic analysis and privacy‑aware deployments in urban and mobility contexts.
Abstract
MAC randomization is a widely used technique implemented on most modern smartphones to protect user's privacy against tracking based on Probe Request frames capture. However, there exist weaknesses in such a methodology which may still expose distinctive information, allowing to track the device generating the Probe Requests. Such techniques, known as MAC de-randomization algorithms, generally exploit Information Elements (IEs) contained in the Probe Requests and use clustering methodologies to group together frames belonging to the same device. While effective on heterogeneous device types, such techniques are not able to differentiate among devices of identical type and running the same Operating System (OS). In this paper, we propose a MAC de-randomization technique able to overcome such a weakness. First, we propose a new dataset of Probe Requests captured from devices sharing the same characteristics. Secondly, we observe that the time-frequency pattern of Probe Request emission is unique among devices and can therefore be used as a discriminative feature. We embed such a feature in a two-stage clustering methodology and show through experiments its effectiveness compared to state-of-the-art techniques based solely on IEs fingerprinting. The original dataset used in this work is made publicly available for reproducible research.
