Table of Contents
Fetching ...

A Multi-Step Comparative Framework for Anomaly Detection in IoT Data Streams

Mohammed Al-Qudah, Fadi AlMahamid

TL;DR

This study tackles the challenge of IoT anomaly detection by systematically evaluating how preprocessing choices interact with three ML models—RNN-LSTM, Autoencoder, and Gradient Boosting—on the IoTID20 dataset. It introduces a six-step framework that jointly considers normalization, transformation, and feature selection under uniform conditions. The results show Gradient Boosting achieves the highest accuracy across configurations, while RNN-LSTM benefits from z-score normalization and Autoencoders excel in recall, highlighting complementary strengths for deployed pipelines. The framework offers practical guidelines for selecting preprocessing strategies and model types to improve IoT security in real-world data streams.

Abstract

The rapid expansion of Internet of Things (IoT) devices has introduced critical security challenges, underscoring the need for accurate anomaly detection. Although numerous studies have proposed machine learning (ML) methods for this purpose, limited research systematically examines how different preprocessing steps--normalization, transformation, and feature selection--interact with distinct model architectures. To address this gap, this paper presents a multi-step evaluation framework assessing the combined impact of preprocessing choices on three ML algorithms: RNN-LSTM, autoencoder neural networks (ANN), and Gradient Boosting (GBoosting). Experiments on the IoTID20 dataset shows that GBoosting consistently delivers superior accuracy across preprocessing configurations, while RNN-LSTM shows notable gains with z-score normalization and autoencoders excel in recall, making them well-suited for unsupervised scenarios. By offering a structured analysis of preprocessing decisions and their interplay with various ML techniques, the proposed framework provides actionable guidance to enhance anomaly detection performance in IoT environments.

A Multi-Step Comparative Framework for Anomaly Detection in IoT Data Streams

TL;DR

This study tackles the challenge of IoT anomaly detection by systematically evaluating how preprocessing choices interact with three ML models—RNN-LSTM, Autoencoder, and Gradient Boosting—on the IoTID20 dataset. It introduces a six-step framework that jointly considers normalization, transformation, and feature selection under uniform conditions. The results show Gradient Boosting achieves the highest accuracy across configurations, while RNN-LSTM benefits from z-score normalization and Autoencoders excel in recall, highlighting complementary strengths for deployed pipelines. The framework offers practical guidelines for selecting preprocessing strategies and model types to improve IoT security in real-world data streams.

Abstract

The rapid expansion of Internet of Things (IoT) devices has introduced critical security challenges, underscoring the need for accurate anomaly detection. Although numerous studies have proposed machine learning (ML) methods for this purpose, limited research systematically examines how different preprocessing steps--normalization, transformation, and feature selection--interact with distinct model architectures. To address this gap, this paper presents a multi-step evaluation framework assessing the combined impact of preprocessing choices on three ML algorithms: RNN-LSTM, autoencoder neural networks (ANN), and Gradient Boosting (GBoosting). Experiments on the IoTID20 dataset shows that GBoosting consistently delivers superior accuracy across preprocessing configurations, while RNN-LSTM shows notable gains with z-score normalization and autoencoders excel in recall, making them well-suited for unsupervised scenarios. By offering a structured analysis of preprocessing decisions and their interplay with various ML techniques, the proposed framework provides actionable guidance to enhance anomaly detection performance in IoT environments.

Paper Structure

This paper contains 24 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Proposed multi-step anomaly detection methodology in IoT environments
  • Figure 2: Feature distributions before and after applying min-max scaling and z-score normalization.
  • Figure 3: Heatmaps illustrating feature correlations before and after applying RFECV and Chi2 feature selection methods.