ZEST: Attention-based Zero-Shot Learning for Unseen IoT Device Classification
Binghui Wu, Philipp Gysel, Dinil Mon Divakaran, Mohan Gurusamy
TL;DR
This work tackles IoT device classification when new, unseen devices appear in the network by introducing ZEST, a generative zero-shot learning framework built on self-attention. ZEST uses SANE, a transformer-based feature extractor, to derive latent representations and device attributes from traffic sequences, and a CVAE to generate pseudo data for unseen devices, enabling a supervised classifier to operate in a unified seen+unseen regime. The approach achieves state-of-the-art results on the UNSW 2018 IoT dataset, with substantial improvements in both ZSL and GZSL settings, and demonstrates faster inference and superior feature extraction compared to LSTM baselines. These findings suggest ZEST’s practical potential for robust real-world IoT fingerprinting where device inventories continually evolve.
Abstract
Recent research works have proposed machine learning models for classifying IoT devices connected to a network. However, there is still a practical challenge of not having all devices (and hence their traffic) available during the training of a model. This essentially means, during the operational phase, we need to classify new devices not seen in the training phase. To address this challenge, we propose ZEST -- a ZSL (zero-shot learning) framework based on self-attention for classifying both seen and unseen devices. ZEST consists of i) a self-attention based network feature extractor, termed SANE, for extracting latent space representations of IoT traffic, ii) a generative model that trains a decoder using latent features to generate pseudo data, and iii) a supervised model that is trained on the generated pseudo data for classifying devices. We carry out extensive experiments on real IoT traffic data; our experiments demonstrate i) ZEST achieves significant improvement (in terms of accuracy) over the baselines; ii) SANE is able to better extract meaningful representations than LSTM which has been commonly used for modeling network traffic.
