Dealing with Imbalanced Classes in Bot-IoT Dataset

Jesse Atuhurra; Takanori Hara; Yuanyu Zhang; Masahiro Sasabe; Shoji Kasahara

Dealing with Imbalanced Classes in Bot-IoT Dataset

Jesse Atuhurra, Takanori Hara, Yuanyu Zhang, Masahiro Sasabe, Shoji Kasahara

TL;DR

This work tackles the class-imbalance problem in the Bot-IoT IoT network dataset used for network intrusion detection. It adopts a three-stage pipeline—preprocessing, SMOTE-based data sampling, and binary classification—evaluating seven classifiers (including LR, SVM variants, RF, XGBoost, and MLP) to detect attack packets. Results show that SMOTE-balanced training preserves high accuracy and recall while substantially reducing false positives (FPR), with competitive AUC scores (notably for RBF SVM and RF). The findings highlight the practical value of data balancing for IoT NIDS and point to future directions in reinforcement learning, federated learning, and multi-class attack identification.

Abstract

With the rapidly spreading usage of Internet of Things (IoT) devices, a network intrusion detection system (NIDS) plays an important role in detecting and protecting various types of attacks in the IoT network. To evaluate the robustness of the NIDS in the IoT network, the existing work proposed a realistic botnet dataset in the IoT network (Bot-IoT dataset) and applied it to machine learning-based anomaly detection. This dataset contains imbalanced normal and attack packets because the number of normal packets is much smaller than that of attack ones. The nature of imbalanced data may make it difficult to identify the minority class correctly. In this thesis, to address the class imbalance problem in the Bot-IoT dataset, we propose a binary classification method with synthetic minority over-sampling techniques (SMOTE). The proposed classifier aims to detect attack packets and overcome the class imbalance problem using the SMOTE algorithm. Through numerical results, we demonstrate the proposed classifier's fundamental characteristics and the impact of imbalanced data on its performance.

Dealing with Imbalanced Classes in Bot-IoT Dataset

TL;DR

Abstract

Paper Structure (21 sections, 8 equations, 9 figures, 7 tables)

This paper contains 21 sections, 8 equations, 9 figures, 7 tables.

Introduction
Related Work
Background
Bot-IoT Dataset
Machine Learning based Binary Classification
Logistic Regression
Support Vector Machine
Random Forest
Extreme Gradient Boosting
Multi-layer Perceptron Neural Network
Data Sampling Methods
Proposed Method
Overview
Preprocessing
Data Sampling
...and 6 more sections

Figures (9)

Figure 1: Comparison of datasets used in intrusion detection (F=False, T=True).
Figure 2: An example of MLP with two hidden layers.
Figure 3: Intrusion Detection analysis based on Bot-IoT dataset.
Figure 4: Feature importance with the random forest algorithm.
Figure 5: Feature importance with the mutual information algorithm.
...and 4 more figures

Dealing with Imbalanced Classes in Bot-IoT Dataset

TL;DR

Abstract

Dealing with Imbalanced Classes in Bot-IoT Dataset

Authors

TL;DR

Abstract

Table of Contents

Figures (9)