Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

Maho Kajiura; Junya Nakamura

Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

Maho Kajiura, Junya Nakamura

TL;DR

The paper addresses the practical performance of a distributed MLNIDS framework by implementing five classifiers (DT, RF, NB, SVM, kNN) within a Tada2019-based pipeline that integrates Zeek, Apache Kafka, Kafka Streams, Logstash, and Elasticsearch. It evaluates throughput and latency on the UNSW-NB15-derived dataset and provides a classifier- and system-level view of bottlenecks, showing that classifier choice materially affects processing speed while maintaining comparable detection performance. Key findings show DT and NB offer the highest throughput under heavy traffic, RF delivers strong classifier performance at lower throughput, and bottlenecks lie in Zeek, Logstash, and Elasticsearch components. The study offers practical guidance on classifier selection relative to traffic volume and highlights areas for scaling improvements and future work, including deeper ML models and long-term stability.

Abstract

Network Intrusion Detection Systems (NIDSs) detect intrusion attacks in network traffic. In particular, machine-learning-based NIDSs have attracted attention because of their high detection rates of unknown attacks. A distributed processing framework for machine-learning-based NIDSs employing a scalable distributed stream processing system has been proposed in the literature. However, its performance, when machine-learning-based classifiers are implemented has not been comprehensively evaluated. In this study, we implement five representative classifiers (Decision Tree, Random Forest, Naive Bayes, SVM, and kNN) based on this framework and evaluate their throughput and latency. By conducting the experimental measurements, we investigate the difference in the processing performance among these classifiers and the bottlenecks in the processing performance of the framework.

Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

TL;DR

Abstract

Paper Structure (15 sections, 2 equations, 5 figures, 6 tables)

This paper contains 15 sections, 2 equations, 5 figures, 6 tables.

Introduction
Related Work
Construction of MLNIDS using a framework
Framework Overview
Implementation of a Machine-Learning-based Classification
Evaluation method
Dataset
Experimental Environment
Performance Metrics
Performance Limits in the Experimental Environment
Experimental Results
Classifier Performance
Maximum Processing Speed
CPU Usage of Each Subsystem
Conclusion

Figures (5)

Figure 1: Process flow in the framework
Figure 2: Logstash process flow
Figure 3: Throughput and latency achieved each classifier in the framework
Figure 4: Pods other than Elasticsearch
Figure 5: Elasticsearch Pods

Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

TL;DR

Abstract

Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

Authors

TL;DR

Abstract

Table of Contents

Figures (5)