Locality Sensitive Hashing for Network Traffic Fingerprinting
Nowfel Mashnoor, Jay Thom, Abdur Rouf, Shamik Sengupta, Batyr Charyyev
TL;DR
The paper addresses IoT device identification for network management, noting that ML-based traffic fingerprinting requires feature extraction, hyperparameter tuning, and retraining to handle concept drift. It proposes LSIF-R, a Locality-Sensitive Hashing approach based on a tunable Nilsimsa function that generates traffic fingerprints without feature extraction and stores them as device signatures. Through extensive parameter exploration, LSIF-R achieves up to around 94% identification accuracy on 23 IoT devices and outperforms a state-of-the-art ML-based method (IoTSentinel) by about 12 percentage points in F1. The work emphasizes lightweight operation, easy update of signatures, and robustness to drift, highlighting practical benefits for scalable, privacy-conscious device identification in dynamic networks.
Abstract
The advent of the Internet of Things (IoT) has brought forth additional intricacies and difficulties to computer networks. These gadgets are particularly susceptible to cyber-attacks because of their simplistic design. Therefore, it is crucial to recognise these devices inside a network for the purpose of network administration and to identify any harmful actions. Network traffic fingerprinting is a crucial technique for identifying devices and detecting anomalies. Currently, the predominant methods for this depend heavily on machine learning (ML). Nevertheless, machine learning (ML) methods need the selection of features, adjustment of hyperparameters, and retraining of models to attain optimal outcomes and provide resilience to concept drifts detected in a network. In this research, we suggest using locality-sensitive hashing (LSH) for network traffic fingerprinting as a solution to these difficulties. Our study focuses on examining several design options for the Nilsimsa LSH function. We then use this function to create unique fingerprints for network data, which may be used to identify devices. We also compared it with ML-based traffic fingerprinting and observed that our method increases the accuracy of state-of-the-art by 12% achieving around 94% accuracy in identifying devices in a network.
