Table of Contents
Fetching ...

LEMDA: A Novel Feature Engineering Method for Intrusion Detection in IoT Systems

Ali Ghubaish, Zebo Yang, Aiman Erbad, Raj Jain

TL;DR

The paper tackles the challenge of intrusion detection in IoT systems characterized by high-dimensional data and computational constraints. It introduces LEMDA, a feature engineering approach built on Mean Decrease in Accuracy (MDA) that combines Weighted Exponential Decay Formula (WEDF) and an optional Sensitivity Factor (SF) to generate a powerful new feature, enabling faster training and superior detection performance across multiple models and datasets. Across three IoT datasets (WUSTL-EHMS, MQTT-IoT, BOT-IoT) and four models, LEMDA achieves, on average, a 34% improvement in F1 score and substantial reductions in training and detection times, outperforming PCA and MDA baselines. The method is model-agnostic, scalable, and particularly effective for IoT IDS, offering practical impact for real-time security in large-scale networks and potentially guiding feature engineering strategies in other domains.

Abstract

Intrusion detection systems (IDS) for the Internet of Things (IoT) systems can use AI-based models to ensure secure communications. IoT systems tend to have many connected devices producing massive amounts of data with high dimensionality, which requires complex models. Complex models have notorious problems such as overfitting, low interpretability, and high computational complexity. Adding model complexity penalty (i.e., regularization) can ease overfitting, but it barely helps interpretability and computational efficiency. Feature engineering can solve these issues; hence, it has become critical for IDS in large-scale IoT systems to reduce the size and dimensionality of data, resulting in less complex models with excellent performance, smaller data storage, and fast detection. This paper proposes a new feature engineering method called LEMDA (Light feature Engineering based on the Mean Decrease in Accuracy). LEMDA applies exponential decay and an optional sensitivity factor to select and create the most informative features. The proposed method has been evaluated and compared to other feature engineering methods using three IoT datasets and four AI/ML models. The results show that LEMDA improves the F1 score performance of all the IDS models by an average of 34% and reduces the average training and detection times in most cases.

LEMDA: A Novel Feature Engineering Method for Intrusion Detection in IoT Systems

TL;DR

The paper tackles the challenge of intrusion detection in IoT systems characterized by high-dimensional data and computational constraints. It introduces LEMDA, a feature engineering approach built on Mean Decrease in Accuracy (MDA) that combines Weighted Exponential Decay Formula (WEDF) and an optional Sensitivity Factor (SF) to generate a powerful new feature, enabling faster training and superior detection performance across multiple models and datasets. Across three IoT datasets (WUSTL-EHMS, MQTT-IoT, BOT-IoT) and four models, LEMDA achieves, on average, a 34% improvement in F1 score and substantial reductions in training and detection times, outperforming PCA and MDA baselines. The method is model-agnostic, scalable, and particularly effective for IoT IDS, offering practical impact for real-time security in large-scale networks and potentially guiding feature engineering strategies in other domains.

Abstract

Intrusion detection systems (IDS) for the Internet of Things (IoT) systems can use AI-based models to ensure secure communications. IoT systems tend to have many connected devices producing massive amounts of data with high dimensionality, which requires complex models. Complex models have notorious problems such as overfitting, low interpretability, and high computational complexity. Adding model complexity penalty (i.e., regularization) can ease overfitting, but it barely helps interpretability and computational efficiency. Feature engineering can solve these issues; hence, it has become critical for IDS in large-scale IoT systems to reduce the size and dimensionality of data, resulting in less complex models with excellent performance, smaller data storage, and fast detection. This paper proposes a new feature engineering method called LEMDA (Light feature Engineering based on the Mean Decrease in Accuracy). LEMDA applies exponential decay and an optional sensitivity factor to select and create the most informative features. The proposed method has been evaluated and compared to other feature engineering methods using three IoT datasets and four AI/ML models. The results show that LEMDA improves the F1 score performance of all the IDS models by an average of 34% and reduces the average training and detection times in most cases.
Paper Structure (19 sections, 11 equations, 6 tables, 2 algorithms)