Table of Contents
Fetching ...

GeMID: Generalizable Models for IoT Device Identification

Kahraman Kostas, Rabia Yasa Kostas, Mike Just, Michael A. Lones

TL;DR

GeMID tackles IoT device identification with a strong emphasis on cross environment generalizability. It introduces a two stage framework that first performs robust feature and model selection using a genetic algorithm with external feedback, then validates device level models on independent data from different networks. The results show that packet header based features outperform flow and window statistics in cross environment generalization, and that traditional statistical approaches are unreliable for DI across networks. The work provides a practical path toward deployable edge based DI systems and underscores the necessity of cross dataset validation in IoT security research.

Abstract

With the proliferation of devices on the Internet of Things (IoT), ensuring their security has become paramount. Device identification (DI), which distinguishes IoT devices based on their traffic patterns, plays a crucial role in both differentiating devices and identifying vulnerable ones, closing a serious security gap. However, existing approaches to DI that build machine learning models often overlook the challenge of model generalizability across diverse network environments. In this study, we propose a novel framework to address this limitation and to evaluate the generalizability of DI models across data sets collected within different network environments. Our approach involves a two-step process: first, we develop a feature and model selection method that is more robust to generalization issues by using a genetic algorithm with external feedback and datasets from distinct environments to refine the selections. Second, the resulting DI models are then tested on further independent datasets to robustly assess their generalizability. We demonstrate the effectiveness of our method by empirically comparing it to alternatives, highlighting how fundamental limitations of commonly employed techniques such as sliding window and flow statistics limit their generalizability. Moreover, we show that statistical methods, widely used in the literature, are unreliable for device identification due to their dependence on network-specific characteristics rather than device-intrinsic properties, challenging the validity of a significant portion of existing research. Our findings advance research in IoT security and device identification, offering insight into improving model effectiveness and mitigating risks in IoT networks.

GeMID: Generalizable Models for IoT Device Identification

TL;DR

GeMID tackles IoT device identification with a strong emphasis on cross environment generalizability. It introduces a two stage framework that first performs robust feature and model selection using a genetic algorithm with external feedback, then validates device level models on independent data from different networks. The results show that packet header based features outperform flow and window statistics in cross environment generalization, and that traditional statistical approaches are unreliable for DI across networks. The work provides a practical path toward deployable edge based DI systems and underscores the necessity of cross dataset validation in IoT security research.

Abstract

With the proliferation of devices on the Internet of Things (IoT), ensuring their security has become paramount. Device identification (DI), which distinguishes IoT devices based on their traffic patterns, plays a crucial role in both differentiating devices and identifying vulnerable ones, closing a serious security gap. However, existing approaches to DI that build machine learning models often overlook the challenge of model generalizability across diverse network environments. In this study, we propose a novel framework to address this limitation and to evaluate the generalizability of DI models across data sets collected within different network environments. Our approach involves a two-step process: first, we develop a feature and model selection method that is more robust to generalization issues by using a genetic algorithm with external feedback and datasets from distinct environments to refine the selections. Second, the resulting DI models are then tested on further independent datasets to robustly assess their generalizability. We demonstrate the effectiveness of our method by empirically comparing it to alternatives, highlighting how fundamental limitations of commonly employed techniques such as sliding window and flow statistics limit their generalizability. Moreover, we show that statistical methods, widely used in the literature, are unreliable for device identification due to their dependence on network-specific characteristics rather than device-intrinsic properties, challenging the validity of a significant portion of existing research. Our findings advance research in IoT security and device identification, offering insight into improving model effectiveness and mitigating risks in IoT networks.

Paper Structure

This paper contains 52 sections, 4 equations, 22 figures, 16 tables.

Figures (22)

  • Figure 1: Raw bytes and headers of a network packet from the Fitbit Aria WiFi enabled weighing device.
  • Figure 2: Devices represented in the UNSW-DI (blue) and UNSW-AD (yellow) datasets, along with their intersection (grey).
  • Figure 3: Devices represented in the MonIoTr dataset from both UK (blue) and USA (yellow) sites, along with their intersection (grey). Adapted from ren-imc19.
  • Figure 4: (Left) Data usage from UNSW datasets across different evaluation contexts. CV, SS, and DD steps are visualized in shades of blue, red, and green, respectively, throughout the paper for clarity. In the nomenclature, CV| AD-S1 denotes cross-validation on the AD dataset session 1. For cases other than CV, the first dataset is used for training and the second for testing. For example, AD-S1| DI-S2 means the first session of the UNSW-AD dataset is used for training and the second session of the UNSW-DI dataset is used for testing. (Right) The legend displayed is shared across Figures \ref{['fig:cizgi']}-\ref{['fig:vote8']} for consistency and clarity.
  • Figure 5: Comparison of feature utility in UNSW datasets measured using CV (blue) and isolated methods, SS (red) and DD (green). CV tends to overestimate feature utility, with higher scores for many attributes. SS and DD produce more realistic evaluations. The discrepancy highlights the potential for information leakage in cross-validation and the importance of using isolated validation methods for assessing feature utility in ML-based DI models. For the legend, please refer to Figure \ref{['fig:sessions']}.
  • ...and 17 more figures