Table of Contents
Fetching ...

Leveraging Machine Learning for Accurate IoT Device Identification in Dynamic Wireless Contexts

Bhagyashri Tushir, Vikram K Ramanna, Yuhong Liu, Behnam Dezfouli

TL;DR

This work tackles IoT device identification in dynamic Wi-Fi environments by shifting from content-based inspection to latency-based fingerprinting. It defines device latency and introduces an accumulation score to capture instantaneous wireless-channel dynamics that influence latency measurements. The authors develop a practical data-collection and ML framework using four probe types and latency-plus-accumulation features, validated on a real testbed with 125k+ samples and multiple algorithms, achieving F1 scores near 97% and demonstrating robustness under channel variations. The approach preserves user privacy by limiting captured data to probe-type and latency information while delivering scalable, hardware-friendly deployment suitable for residential networking equipment.

Abstract

Identifying IoT devices is crucial for network monitoring, security enforcement, and inventory tracking. However, most existing identification methods rely on deep packet inspection, which raises privacy concerns and adds computational complexity. More importantly, existing works overlook the impact of wireless channel dynamics on the accuracy of layer-2 features, thereby limiting their effectiveness in real-world scenarios. In this work, we define and use the latency of specific probe-response packet exchanges, referred to as "device latency," as the main feature for device identification. Additionally, we reveal the critical impact of wireless channel dynamics on the accuracy of device identification based on device latency. Specifically, this work introduces "accumulation score" as a novel approach to capturing fine-grained channel dynamics and their impact on device latency when training machine learning models. We implement the proposed methods and measure the accuracy and overhead of device identification in real-world scenarios. The results confirm that by incorporating the accumulation score for balanced data collection and training machine learning algorithms, we achieve an F1 score of over 97% for device identification, even amidst wireless channel dynamics, a significant improvement over the 75% F1 score achieved by disregarding the impact of channel dynamics on data collection and device latency.

Leveraging Machine Learning for Accurate IoT Device Identification in Dynamic Wireless Contexts

TL;DR

This work tackles IoT device identification in dynamic Wi-Fi environments by shifting from content-based inspection to latency-based fingerprinting. It defines device latency and introduces an accumulation score to capture instantaneous wireless-channel dynamics that influence latency measurements. The authors develop a practical data-collection and ML framework using four probe types and latency-plus-accumulation features, validated on a real testbed with 125k+ samples and multiple algorithms, achieving F1 scores near 97% and demonstrating robustness under channel variations. The approach preserves user privacy by limiting captured data to probe-type and latency information while delivering scalable, hardware-friendly deployment suitable for residential networking equipment.

Abstract

Identifying IoT devices is crucial for network monitoring, security enforcement, and inventory tracking. However, most existing identification methods rely on deep packet inspection, which raises privacy concerns and adds computational complexity. More importantly, existing works overlook the impact of wireless channel dynamics on the accuracy of layer-2 features, thereby limiting their effectiveness in real-world scenarios. In this work, we define and use the latency of specific probe-response packet exchanges, referred to as "device latency," as the main feature for device identification. Additionally, we reveal the critical impact of wireless channel dynamics on the accuracy of device identification based on device latency. Specifically, this work introduces "accumulation score" as a novel approach to capturing fine-grained channel dynamics and their impact on device latency when training machine learning models. We implement the proposed methods and measure the accuracy and overhead of device identification in real-world scenarios. The results confirm that by incorporating the accumulation score for balanced data collection and training machine learning algorithms, we achieve an F1 score of over 97% for device identification, even amidst wireless channel dynamics, a significant improvement over the 75% F1 score achieved by disregarding the impact of channel dynamics on data collection and device latency.
Paper Structure (16 sections, 2 equations, 12 figures, 1 table)

This paper contains 16 sections, 2 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Device latency ($l$) is defined as the interval between $t_4$ to $t_6$. Round-Trip Time (RTT) is defined as the interval between crafting a probe packet by the until the reception of the response packet (generated by the device) by the . Extracting device latency ($t_4$ to $t_6$) from RTT ($t_1$ to $t_7$) is challenging due to the variability of $t_1$ to $t_4$.
  • Figure 2: The testbed components including machines used for background traffic generation, IoT devices, and sniffer. Note that various experiments of this work utilize subsets of these components, depending on the specific objectives and requirements of each study.
  • Figure 3: Inter-packet intervals when the is sending a probe packet every 1 ms. The horizontal dashed line represents the 95th percentile, and the vertical line indicates the point where the inter-packet time intersects with the 95th percentile.
  • Figure 4: Device latency ($l_{tcp}^{lo}$) for various devices and ranges. Results are collected when using low-payload (0 B) TCP-SYN probe packets. Circles, squares, and horizontal lines represent maximum, minimum, and median values, respectively.
  • Figure 5: A sample probe-response packet exchange with Predecessor and Successor packets. Accumulation score calculation associated with each device latency value uses three major parameters: the duration of Predecessor and Successor packets, inter-packet intervals, and value obtained from the projection of packets' midpoints onto a Bell or Gamma curve.
  • ...and 7 more figures