Table of Contents
Fetching ...

Performance of Machine Learning Classifiers for Anomaly Detection in Cyber Security Applications

Markus Haug, Gissel Velarde

TL;DR

The paper investigates binary classification under severe class imbalance in cyber security fraud detection by comparing supervised (XGBoost, MLP) and unsupervised/generative approaches (GAN, VAE, MO-GAAL) across two public datasets, with oversampling (ROS) and self-paced ensembling (SPE). It employs a standardized pipeline with an 80/20 stratified split and 5-fold cross-validation, and also examines missing data imputation. Results show XGBoost and MLP consistently outperform generative models, with ROS boosting precision and MLP+ROS often yielding strong performance, while IterativeImputer is computationally expensive and not advantageous for large datasets. The study provides practical guidance on model and imputation choices for imbalanced cybersecurity tasks, with the code publicly available on GitHub for reproducibility.

Abstract

This work empirically evaluates machine learning models on two imbalanced public datasets (KDDCUP99 and Credit Card Fraud 2013). The method includes data preparation, model training, and evaluation, using an 80/20 (train/test) split. Models tested include eXtreme Gradient Boosting (XGB), Multi Layer Perceptron (MLP), Generative Adversarial Network (GAN), Variational Autoencoder (VAE), and Multiple-Objective Generative Adversarial Active Learning (MO-GAAL), with XGB and MLP further combined with Random-Over-Sampling (ROS) and Self-Paced-Ensemble (SPE). Evaluation involves 5-fold cross-validation and imputation techniques (mean, median, and IterativeImputer) with 10, 20, 30, and 50 % missing data. Findings show XGB and MLP outperform generative models. IterativeImputer results are comparable to mean and median, but not recommended for large datasets due to increased complexity and execution time. The code used is publicly available on GitHub (github.com/markushaug/acr-25).

Performance of Machine Learning Classifiers for Anomaly Detection in Cyber Security Applications

TL;DR

The paper investigates binary classification under severe class imbalance in cyber security fraud detection by comparing supervised (XGBoost, MLP) and unsupervised/generative approaches (GAN, VAE, MO-GAAL) across two public datasets, with oversampling (ROS) and self-paced ensembling (SPE). It employs a standardized pipeline with an 80/20 stratified split and 5-fold cross-validation, and also examines missing data imputation. Results show XGBoost and MLP consistently outperform generative models, with ROS boosting precision and MLP+ROS often yielding strong performance, while IterativeImputer is computationally expensive and not advantageous for large datasets. The study provides practical guidance on model and imputation choices for imbalanced cybersecurity tasks, with the code publicly available on GitHub for reproducibility.

Abstract

This work empirically evaluates machine learning models on two imbalanced public datasets (KDDCUP99 and Credit Card Fraud 2013). The method includes data preparation, model training, and evaluation, using an 80/20 (train/test) split. Models tested include eXtreme Gradient Boosting (XGB), Multi Layer Perceptron (MLP), Generative Adversarial Network (GAN), Variational Autoencoder (VAE), and Multiple-Objective Generative Adversarial Active Learning (MO-GAAL), with XGB and MLP further combined with Random-Over-Sampling (ROS) and Self-Paced-Ensemble (SPE). Evaluation involves 5-fold cross-validation and imputation techniques (mean, median, and IterativeImputer) with 10, 20, 30, and 50 % missing data. Findings show XGB and MLP outperform generative models. IterativeImputer results are comparable to mean and median, but not recommended for large datasets due to increased complexity and execution time. The code used is publicly available on GitHub (github.com/markushaug/acr-25).

Paper Structure

This paper contains 11 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Graphical representation of the Method. Each dataset goes through a Pre-Processing stage before Model Training and Evaluation. Models are trained and tested independently. XGB and MLP are selected and combined with ROS and SPE.
  • Figure 2: Models' performance on the credit card dataset, based on 5-fold cross-validation on the training data. For XGB and MLP, only the best sampling combinations are shown.
  • Figure 3: Models' performance on the KDDCUP99 dataset, based on 5-fold cross-validation on the training data. For XGB and MLP, only the best sampling combinations are shown.
  • Figure 4: Training times in seconds (Execution Time). XGB proves to be the most efficient model, while MO-GAAL stands out due to its very long training time. For XGB and MLP, only the best sampling combinations are shown.
  • Figure 5: XGB performance with missing data using imputation techniques on credit card and KDDCUP dataset. The error bars represent the standard deviations of the measurements, each of which was performed ten times.