Leveraging Machine Learning for Accurate IoT Device Identification in Dynamic Wireless Contexts
Bhagyashri Tushir, Vikram K Ramanna, Yuhong Liu, Behnam Dezfouli
TL;DR
This work tackles IoT device identification in dynamic Wi-Fi environments by shifting from content-based inspection to latency-based fingerprinting. It defines device latency and introduces an accumulation score to capture instantaneous wireless-channel dynamics that influence latency measurements. The authors develop a practical data-collection and ML framework using four probe types and latency-plus-accumulation features, validated on a real testbed with 125k+ samples and multiple algorithms, achieving F1 scores near 97% and demonstrating robustness under channel variations. The approach preserves user privacy by limiting captured data to probe-type and latency information while delivering scalable, hardware-friendly deployment suitable for residential networking equipment.
Abstract
Identifying IoT devices is crucial for network monitoring, security enforcement, and inventory tracking. However, most existing identification methods rely on deep packet inspection, which raises privacy concerns and adds computational complexity. More importantly, existing works overlook the impact of wireless channel dynamics on the accuracy of layer-2 features, thereby limiting their effectiveness in real-world scenarios. In this work, we define and use the latency of specific probe-response packet exchanges, referred to as "device latency," as the main feature for device identification. Additionally, we reveal the critical impact of wireless channel dynamics on the accuracy of device identification based on device latency. Specifically, this work introduces "accumulation score" as a novel approach to capturing fine-grained channel dynamics and their impact on device latency when training machine learning models. We implement the proposed methods and measure the accuracy and overhead of device identification in real-world scenarios. The results confirm that by incorporating the accumulation score for balanced data collection and training machine learning algorithms, we achieve an F1 score of over 97% for device identification, even amidst wireless channel dynamics, a significant improvement over the 75% F1 score achieved by disregarding the impact of channel dynamics on data collection and device latency.
