Table of Contents
Fetching ...

From Flows to Functions: Macroscopic Behavioral Fingerprinting of IoT Devices via Network Services

Shayan Azizi, Norihiro Okui, Masataka Nakahara, Ayumu Kubota, Hassan Habibi Gharakheili

TL;DR

This work proposes a macroscopic, interpretable IoT fingerprinting approach that models device behavior by the network services they access over extended periods. By formalizing three representations—Service List, Service Prevalence, and Generalized (G)—and introducing a fingerprint exporter guided by a similarity threshold, the authors achieve stable, recurrence-friendly fingerprints. The method demonstrates strong closed-set device identification and reasonable open-set performance on a large, long-term IPFIX dataset, with tunable granularity balancing responsiveness and robustness. The approach aligns with MUD concepts and offers a scalable, explainable alternative to fine-grained ML-based traffic classification for network security and policy enforcement.

Abstract

Identifying devices such as cameras, printers, voice assistants, or health monitoring sensors, collectively known as the Internet of Things (IoT), within a network is a critical operational task, particularly to manage the cyber risks they introduce. While behavioral fingerprinting based on network traffic analysis has shown promise, most existing approaches rely on machine learning (ML) techniques applied to fine-grained features of short-lived traffic units (packets and/or flows). These methods tend to be computationally expensive, sensitive to traffic measurement errors, and often produce opaque inferences. In this paper, we propose a macroscopic, lightweight, and explainable alternative to behavioral fingerprinting focusing on the network services (e.g., TCP/80, UDP/53) that IoT devices use to perform their intended functions over extended periods. Our contributions are threefold. (1) We demonstrate that IoT devices exhibit stable and distinguishable patterns in their use of network services over a period of time. We formalize the notion of service-level fingerprints and derive a generalized method to represent network behaviors using a configurable granularity parameter. (2) We develop a procedure to extract service-level fingerprints, apply it to traffic from 13 consumer IoT device types in a lab testbed, and evaluate the resulting representations in terms of their convergence and recurrence properties. (3) We validate the efficacy of service-level fingerprints for device identification in closed-set and open-set scenarios. Our findings are based on a large dataset comprising about 10 million IPFIX flow records collected over a 1.5-year period.

From Flows to Functions: Macroscopic Behavioral Fingerprinting of IoT Devices via Network Services

TL;DR

This work proposes a macroscopic, interpretable IoT fingerprinting approach that models device behavior by the network services they access over extended periods. By formalizing three representations—Service List, Service Prevalence, and Generalized (G)—and introducing a fingerprint exporter guided by a similarity threshold, the authors achieve stable, recurrence-friendly fingerprints. The method demonstrates strong closed-set device identification and reasonable open-set performance on a large, long-term IPFIX dataset, with tunable granularity balancing responsiveness and robustness. The approach aligns with MUD concepts and offers a scalable, explainable alternative to fine-grained ML-based traffic classification for network security and policy enforcement.

Abstract

Identifying devices such as cameras, printers, voice assistants, or health monitoring sensors, collectively known as the Internet of Things (IoT), within a network is a critical operational task, particularly to manage the cyber risks they introduce. While behavioral fingerprinting based on network traffic analysis has shown promise, most existing approaches rely on machine learning (ML) techniques applied to fine-grained features of short-lived traffic units (packets and/or flows). These methods tend to be computationally expensive, sensitive to traffic measurement errors, and often produce opaque inferences. In this paper, we propose a macroscopic, lightweight, and explainable alternative to behavioral fingerprinting focusing on the network services (e.g., TCP/80, UDP/53) that IoT devices use to perform their intended functions over extended periods. Our contributions are threefold. (1) We demonstrate that IoT devices exhibit stable and distinguishable patterns in their use of network services over a period of time. We formalize the notion of service-level fingerprints and derive a generalized method to represent network behaviors using a configurable granularity parameter. (2) We develop a procedure to extract service-level fingerprints, apply it to traffic from 13 consumer IoT device types in a lab testbed, and evaluate the resulting representations in terms of their convergence and recurrence properties. (3) We validate the efficacy of service-level fingerprints for device identification in closed-set and open-set scenarios. Our findings are based on a large dataset comprising about 10 million IPFIX flow records collected over a 1.5-year period.

Paper Structure

This paper contains 17 sections, 3 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: IoT devices exhibit distinct patterns (a,b) in their usage of network services. While SL representations may yield stable fingerprints for some devices (c), they are not helpful for some other devices that communicate with a large range of dynamic or Ephemeral destination ports (d). SP representation can stabilize the mentioned behavior (f), but could be too sensitive to variabilities in network usage (e). The G representations, when used with an appropriate choice of granularity level, are capable of addressing the mentioned shortfalls (g, h).
  • Figure 2: Effect of threshold $\theta$ and granularity level $g$ on: (a) fingerprint convergence (fraction of fingerprinted devices), and (b) recurrence score. Higher $\theta$ and lower $g$ reduce convergence, while recurrence improves with stricter $\theta$ but degrades at extreme $g$ values.
  • Figure 3: Confusion matrices for classification using the augmented fingerprints pool at $(g, \theta)=(2048, 0.95)$ (a) closed-set scenario, (b) open-set scenario.