Table of Contents
Fetching ...

Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

Maho Kajiura, Junya Nakamura

TL;DR

The paper addresses the practical performance of a distributed MLNIDS framework by implementing five classifiers (DT, RF, NB, SVM, kNN) within a Tada2019-based pipeline that integrates Zeek, Apache Kafka, Kafka Streams, Logstash, and Elasticsearch. It evaluates throughput and latency on the UNSW-NB15-derived dataset and provides a classifier- and system-level view of bottlenecks, showing that classifier choice materially affects processing speed while maintaining comparable detection performance. Key findings show DT and NB offer the highest throughput under heavy traffic, RF delivers strong classifier performance at lower throughput, and bottlenecks lie in Zeek, Logstash, and Elasticsearch components. The study offers practical guidance on classifier selection relative to traffic volume and highlights areas for scaling improvements and future work, including deeper ML models and long-term stability.

Abstract

Network Intrusion Detection Systems (NIDSs) detect intrusion attacks in network traffic. In particular, machine-learning-based NIDSs have attracted attention because of their high detection rates of unknown attacks. A distributed processing framework for machine-learning-based NIDSs employing a scalable distributed stream processing system has been proposed in the literature. However, its performance, when machine-learning-based classifiers are implemented has not been comprehensively evaluated. In this study, we implement five representative classifiers (Decision Tree, Random Forest, Naive Bayes, SVM, and kNN) based on this framework and evaluate their throughput and latency. By conducting the experimental measurements, we investigate the difference in the processing performance among these classifiers and the bottlenecks in the processing performance of the framework.

Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

TL;DR

The paper addresses the practical performance of a distributed MLNIDS framework by implementing five classifiers (DT, RF, NB, SVM, kNN) within a Tada2019-based pipeline that integrates Zeek, Apache Kafka, Kafka Streams, Logstash, and Elasticsearch. It evaluates throughput and latency on the UNSW-NB15-derived dataset and provides a classifier- and system-level view of bottlenecks, showing that classifier choice materially affects processing speed while maintaining comparable detection performance. Key findings show DT and NB offer the highest throughput under heavy traffic, RF delivers strong classifier performance at lower throughput, and bottlenecks lie in Zeek, Logstash, and Elasticsearch components. The study offers practical guidance on classifier selection relative to traffic volume and highlights areas for scaling improvements and future work, including deeper ML models and long-term stability.

Abstract

Network Intrusion Detection Systems (NIDSs) detect intrusion attacks in network traffic. In particular, machine-learning-based NIDSs have attracted attention because of their high detection rates of unknown attacks. A distributed processing framework for machine-learning-based NIDSs employing a scalable distributed stream processing system has been proposed in the literature. However, its performance, when machine-learning-based classifiers are implemented has not been comprehensively evaluated. In this study, we implement five representative classifiers (Decision Tree, Random Forest, Naive Bayes, SVM, and kNN) based on this framework and evaluate their throughput and latency. By conducting the experimental measurements, we investigate the difference in the processing performance among these classifiers and the bottlenecks in the processing performance of the framework.
Paper Structure (15 sections, 2 equations, 5 figures, 6 tables)

This paper contains 15 sections, 2 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Process flow in the framework
  • Figure 2: Logstash process flow
  • Figure 3: Throughput and latency achieved each classifier in the framework
  • Figure 4: Pods other than Elasticsearch
  • Figure 5: Elasticsearch Pods