MobRFFI: Non-cooperative Device Re-identification for Mobility Intelligence
Stepan Mazokha, Fanchen Bao, George Sklivanitis, Jason O. Hallstrom
TL;DR
This work addresses the challenge of WiFi MAC address randomization, which hinders mobility analytics. It introduces MobRFFI, an encoder-based RF fingerprinting framework that extracts hardware-impairment fingerprints from WiFi preamble spectrograms and performs multi-receiver, RSSI-weighted re-identification with a vector database, enabling open-set identification of devices without MACs. The approach is validated on the WiSig dataset and a newly collected MobRFFI dataset, showing strong single-day and multi-day re-identification performance, with substantial gains achieved by fusing fingerprints from multiple receivers. The work also presents spectrogram optimization, a large-scale multi-receiver dataset, and thorough open-set evaluations, highlighting the practical impact for privacy-preserving mobility monitoring in urban environments.
Abstract
WiFi-based mobility monitoring in urban environments can provide valuable insights into pedestrian and vehicle movements. However, MAC address randomization introduces a significant obstacle in accurately estimating congestion levels and path trajectories. To this end, we consider radio frequency fingerprinting and re-identification for attributing WiFi traffic to emitting devices without the use of MAC addresses. We present MobRFFI, an AI-based device fingerprinting and re-identification framework for WiFi networks that leverages an encoder deep learning model to extract unique features based on WiFi chipset hardware impairments. It is entirely independent of frame type. When evaluated on the WiFi fingerprinting dataset WiSig, our approach achieves 94% and 100% device accuracy in multi-day and single-day re-identification scenarios, respectively. We also collect a novel dataset, MobRFFI, for granular multi-receiver WiFi device fingerprinting evaluation. Using the dataset, we demonstrate that the combination of fingerprints from multiple receivers boosts re-identification performance from 81% to 100% on a single-day scenario and from 41% to 100% on a multi-day scenario.
