IoT Firmware Version Identification Using Transfer Learning with Twin Neural Networks
Ashley Andrews, George Oikonomou, Simon Armour, Paul Thomas, Thomas Cattermole
TL;DR
This work addresses IoT firmware version identification under limited labeled data by combining flow-based on-wire features, greyscale image representations, and transfer-learning via Twin Neural Networks. By using Hedge's g to assess similarity-score differences, the approach detects both stable versions and subtle version changes, achieving up to 95.83% accuracy for stability and 84.38% for changes in a 12-device lab over 11 days. The method demonstrates strong performance and robustness across runs, while revealing device-specific fingerprint variability that affects detection; Hedge's g provides a practical, per-device threshold-free measure for change detection. The authors propose a cloud-based deployment with a fingerprint database and re-training to enable scalable, automated IoT security monitoring and anomaly detection in real-world networks.
Abstract
As the Internet of Things (IoT) becomes more embedded within our daily lives, there is growing concern about the risk `smart' devices pose to network security. To address this, one avenue of research has focused on automated IoT device identification. Research has however largely neglected the identification of IoT device firmware versions. There is strong evidence that IoT security relies on devices being on the latest version patched for known vulnerabilities. Identifying when a device has updated (has changed version) or not (is on a stable version) is therefore useful for IoT security. Version identification involves challenges beyond those for identifying the model, type, and manufacturer of IoT devices, and traditional machine learning algorithms are ill-suited for effective version identification due to being limited by the availability of data for training. In this paper, we introduce an effective technique for identifying IoT device versions based on transfer learning. This technique relies on the idea that we can use a Twin Neural Network (TNN) - trained at distinguishing devices - to detect differences between a device on different versions. This facilitates real-world implementation by requiring relatively little training data. We extract statistical features from on-wire packet flows, convert these features into greyscale images, pass these images into a TNN, and determine version changes based on the Hedges' g effect size of the similarity scores. This allows us to detect the subtle changes present in on-wire traffic when a device changes version. To evaluate our technique, we set up a lab containing 12 IoT devices and recorded their on-wire packet captures for 11 days across multiple firmware versions. For testing data held out from training, our best performing model is shown to be 95.83% and 84.38% accurate at identifying stable versions and version changes respectively.
