Table of Contents
Fetching ...

A Hypergraph-Based Machine Learning Ensemble Network Intrusion Detection System

Zong-Zhi Lin, Thomas D. Pike, Mark M. Bailey, Nathaniel D. Bastian

TL;DR

The paper tackles the challenge of real-time network intrusion detection amid evolving port-scan and adversarial threats. It introduces a hypergraph-based representation of port-scan activity and derives $s$-closeness centrality features to augment an ML ensemble comprising RF and LightGBM models. Through an online evaluation framework with adversarial data augmentation and an Update-ALL-NIDS retraining policy, the approach achieves near-perfect detection on port-scan data and the CIC-IDS2017 dataset, demonstrating robustness and resiliency. The work highlights the practical impact of incorporating hypergraph metrics into adaptive NIDS, enabling timely retraining and improved defense against sophisticated cyber threats.

Abstract

Network intrusion detection systems (NIDS) to detect malicious attacks continue to meet challenges. NIDS are often developed offline while they face auto-generated port scan infiltration attempts, resulting in a significant time lag from adversarial adaption to NIDS response. To address these challenges, we use hypergraphs focused on internet protocol addresses and destination ports to capture evolving patterns of port scan attacks. The derived set of hypergraph-based metrics are then used to train an ensemble machine learning (ML) based NIDS that allows for real-time adaption in monitoring and detecting port scanning activities, other types of attacks, and adversarial intrusions at high accuracy, precision and recall performances. This ML adapting NIDS was developed through the combination of (1) intrusion examples, (2) NIDS update rules, (3) attack threshold choices to trigger NIDS retraining requests, and (4) a production environment with no prior knowledge of the nature of network traffic. 40 scenarios were auto-generated to evaluate the ML ensemble NIDS comprising three tree-based models. The resulting ML Ensemble NIDS was extended and evaluated with the CIC-IDS2017 dataset. Results show that under the model settings of an Update-ALL-NIDS rule (specifically retrain and update all the three models upon the same NIDS retraining request) the proposed ML ensemble NIDS evolved intelligently and produced the best results with nearly 100% detection performance throughout the simulation.

A Hypergraph-Based Machine Learning Ensemble Network Intrusion Detection System

TL;DR

The paper tackles the challenge of real-time network intrusion detection amid evolving port-scan and adversarial threats. It introduces a hypergraph-based representation of port-scan activity and derives -closeness centrality features to augment an ML ensemble comprising RF and LightGBM models. Through an online evaluation framework with adversarial data augmentation and an Update-ALL-NIDS retraining policy, the approach achieves near-perfect detection on port-scan data and the CIC-IDS2017 dataset, demonstrating robustness and resiliency. The work highlights the practical impact of incorporating hypergraph metrics into adaptive NIDS, enabling timely retraining and improved defense against sophisticated cyber threats.

Abstract

Network intrusion detection systems (NIDS) to detect malicious attacks continue to meet challenges. NIDS are often developed offline while they face auto-generated port scan infiltration attempts, resulting in a significant time lag from adversarial adaption to NIDS response. To address these challenges, we use hypergraphs focused on internet protocol addresses and destination ports to capture evolving patterns of port scan attacks. The derived set of hypergraph-based metrics are then used to train an ensemble machine learning (ML) based NIDS that allows for real-time adaption in monitoring and detecting port scanning activities, other types of attacks, and adversarial intrusions at high accuracy, precision and recall performances. This ML adapting NIDS was developed through the combination of (1) intrusion examples, (2) NIDS update rules, (3) attack threshold choices to trigger NIDS retraining requests, and (4) a production environment with no prior knowledge of the nature of network traffic. 40 scenarios were auto-generated to evaluate the ML ensemble NIDS comprising three tree-based models. The resulting ML Ensemble NIDS was extended and evaluated with the CIC-IDS2017 dataset. Results show that under the model settings of an Update-ALL-NIDS rule (specifically retrain and update all the three models upon the same NIDS retraining request) the proposed ML ensemble NIDS evolved intelligently and produced the best results with nearly 100% detection performance throughout the simulation.
Paper Structure (18 sections, 1 equation, 13 figures, 4 tables, 2 algorithms)

This paper contains 18 sections, 1 equation, 13 figures, 4 tables, 2 algorithms.

Figures (13)

  • Figure 1: Evaluation Framework for the ML Ensemble NIDS
  • Figure 2: This hypergraph with 15 edges and 34 vertices was constructed by 43 records with benign and port scan attack class label denoted as B and $\wedge$, respectively. The ports 21, 53, 443, 56344, 8613, 35066, 42154, and 55107 in normal uses by source and destination IPs were also part of smaller set of centers to concentric circles associated with source or destination IPs.
  • Figure 3: Trends of $s$-Closeness-Centrality ($s$-C-C) metrics of hyperedges for port scan records in a hypergraph whose hyperedges are defined as source or destination IP address associated with the nodes of scanning destination ports. This trend of $s$-Closeness-Centrality metrics chart is for hyperedges of destination IP based on $15,806$ port scan only records of CIC-IDS2017 port scan dataset.
  • Figure 4: Trends of mean $s$-Closeness-Centrality ($s$-C-C) metrics of hyperedges by Attack Type in a hypergraph whose hyperedges are defined as source or destination IP address associated with scanning the nodes of destination ports. These trends of mean $s$-Closeness-Centrality metrics chart by Attack Type are for hyperedges of destination IP based on $28,645$ records (Benign or port scan) of CIC-IDS2017 port scan dataset.
  • Figure 5: Distributions of port scan Adversarial Examples Detection Score for four NIDS Models (A) Assuming Hacker does not know the model uses hypergraph $s$-closeness centralities metrics and uses only nine Raw Features (RFs) to construct its substitute NIDS model, (B) the NIDS model this work trained with nine RFs only, (C) the NIDS model trained with the dataset $HGI$, and (D) the NIDS model trained with the dataset $HGA$.
  • ...and 8 more figures