A Multi-Step Comparative Framework for Anomaly Detection in IoT Data Streams
Mohammed Al-Qudah, Fadi AlMahamid
TL;DR
This study tackles the challenge of IoT anomaly detection by systematically evaluating how preprocessing choices interact with three ML models—RNN-LSTM, Autoencoder, and Gradient Boosting—on the IoTID20 dataset. It introduces a six-step framework that jointly considers normalization, transformation, and feature selection under uniform conditions. The results show Gradient Boosting achieves the highest accuracy across configurations, while RNN-LSTM benefits from z-score normalization and Autoencoders excel in recall, highlighting complementary strengths for deployed pipelines. The framework offers practical guidelines for selecting preprocessing strategies and model types to improve IoT security in real-world data streams.
Abstract
The rapid expansion of Internet of Things (IoT) devices has introduced critical security challenges, underscoring the need for accurate anomaly detection. Although numerous studies have proposed machine learning (ML) methods for this purpose, limited research systematically examines how different preprocessing steps--normalization, transformation, and feature selection--interact with distinct model architectures. To address this gap, this paper presents a multi-step evaluation framework assessing the combined impact of preprocessing choices on three ML algorithms: RNN-LSTM, autoencoder neural networks (ANN), and Gradient Boosting (GBoosting). Experiments on the IoTID20 dataset shows that GBoosting consistently delivers superior accuracy across preprocessing configurations, while RNN-LSTM shows notable gains with z-score normalization and autoencoders excel in recall, making them well-suited for unsupervised scenarios. By offering a structured analysis of preprocessing decisions and their interplay with various ML techniques, the proposed framework provides actionable guidance to enhance anomaly detection performance in IoT environments.
