Dealing with Imbalanced Classes in Bot-IoT Dataset
Jesse Atuhurra, Takanori Hara, Yuanyu Zhang, Masahiro Sasabe, Shoji Kasahara
TL;DR
This work tackles the class-imbalance problem in the Bot-IoT IoT network dataset used for network intrusion detection. It adopts a three-stage pipeline—preprocessing, SMOTE-based data sampling, and binary classification—evaluating seven classifiers (including LR, SVM variants, RF, XGBoost, and MLP) to detect attack packets. Results show that SMOTE-balanced training preserves high accuracy and recall while substantially reducing false positives (FPR), with competitive AUC scores (notably for RBF SVM and RF). The findings highlight the practical value of data balancing for IoT NIDS and point to future directions in reinforcement learning, federated learning, and multi-class attack identification.
Abstract
With the rapidly spreading usage of Internet of Things (IoT) devices, a network intrusion detection system (NIDS) plays an important role in detecting and protecting various types of attacks in the IoT network. To evaluate the robustness of the NIDS in the IoT network, the existing work proposed a realistic botnet dataset in the IoT network (Bot-IoT dataset) and applied it to machine learning-based anomaly detection. This dataset contains imbalanced normal and attack packets because the number of normal packets is much smaller than that of attack ones. The nature of imbalanced data may make it difficult to identify the minority class correctly. In this thesis, to address the class imbalance problem in the Bot-IoT dataset, we propose a binary classification method with synthetic minority over-sampling techniques (SMOTE). The proposed classifier aims to detect attack packets and overcome the class imbalance problem using the SMOTE algorithm. Through numerical results, we demonstrate the proposed classifier's fundamental characteristics and the impact of imbalanced data on its performance.
