Table of Contents
Fetching ...

Temporal Analysis of NetFlow Datasets for Network Intrusion Detection Systems

Majed Luay, Siamak Layeghy, Seyedehfaezeh Hosseininoorbin, Mohanad Sarhan, Nour Moustafa, Marius Portmann

TL;DR

The paper addresses the lack of temporal features in widely used NetFlow datasets for ML-based NIDS by introducing NF3, a NetFlow version 3 collection of UNSW-NB15, BoT-IoT, ToN-IoT, and CIC2018. NF3 adds precise flow start/end timing and inter-packet arrival time statistics, preserving original timestamps and enabling binary and multiclass labeling, all via an automated PCAP→NetFlow workflow using nProbe. The authors perform a thorough temporal analysis, including flow-length distributions, IAT patterns, per-minute flow counts, time-series representations for numerical and categorical features, and time-frequency representations such as spectrograms, to uncover attack-specific dynamics. They publicly release the enriched NF3 datasets to support cross-dataset temporal analysis and more robust evaluation of ML-based NIDS, aiming to improve detection of time-evolving and coordinated attacks. The work demonstrates that temporal features and TF representations can reveal distinct attack patterns, offering new directions for temporal-model-based intrusion detection and cross-dataset benchmarking.

Abstract

This paper investigates the temporal analysis of NetFlow datasets for machine learning (ML)-based network intrusion detection systems (NIDS). Although many previous studies have highlighted the critical role of temporal features, such as inter-packet arrival time and flow length/duration, in NIDS, the currently available NetFlow datasets for NIDS lack these temporal features. This study addresses this gap by creating and making publicly available a set of NetFlow datasets that incorporate these temporal features [1]. With these temporal features, we provide a comprehensive temporal analysis of NetFlow datasets by examining the distribution of various features over time and presenting time-series representations of NetFlow features. This temporal analysis has not been previously provided in the existing literature. We also borrowed an idea from signal processing, time frequency analysis, and tested it to see how different the time frequency signal presentations (TFSPs) are for various attacks. The results indicate that many attacks have unique patterns, which could help ML models to identify them more easily.

Temporal Analysis of NetFlow Datasets for Network Intrusion Detection Systems

TL;DR

The paper addresses the lack of temporal features in widely used NetFlow datasets for ML-based NIDS by introducing NF3, a NetFlow version 3 collection of UNSW-NB15, BoT-IoT, ToN-IoT, and CIC2018. NF3 adds precise flow start/end timing and inter-packet arrival time statistics, preserving original timestamps and enabling binary and multiclass labeling, all via an automated PCAP→NetFlow workflow using nProbe. The authors perform a thorough temporal analysis, including flow-length distributions, IAT patterns, per-minute flow counts, time-series representations for numerical and categorical features, and time-frequency representations such as spectrograms, to uncover attack-specific dynamics. They publicly release the enriched NF3 datasets to support cross-dataset temporal analysis and more robust evaluation of ML-based NIDS, aiming to improve detection of time-evolving and coordinated attacks. The work demonstrates that temporal features and TF representations can reveal distinct attack patterns, offering new directions for temporal-model-based intrusion detection and cross-dataset benchmarking.

Abstract

This paper investigates the temporal analysis of NetFlow datasets for machine learning (ML)-based network intrusion detection systems (NIDS). Although many previous studies have highlighted the critical role of temporal features, such as inter-packet arrival time and flow length/duration, in NIDS, the currently available NetFlow datasets for NIDS lack these temporal features. This study addresses this gap by creating and making publicly available a set of NetFlow datasets that incorporate these temporal features [1]. With these temporal features, we provide a comprehensive temporal analysis of NetFlow datasets by examining the distribution of various features over time and presenting time-series representations of NetFlow features. This temporal analysis has not been previously provided in the existing literature. We also borrowed an idea from signal processing, time frequency analysis, and tested it to see how different the time frequency signal presentations (TFSPs) are for various attacks. The results indicate that many attacks have unique patterns, which could help ML models to identify them more easily.

Paper Structure

This paper contains 15 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of the Dataset Conversion and Labeling Process
  • Figure 2: Flow length distribution in NF3-Datasets. The x-axis represents the length of flows in milliseconds, while the y-axis represents the frequency of a length, i.e., the number of flows with the same flow length.
  • Figure 3: Average distribution for Inter-Packet arrival time from source to destination.
  • Figure 4: Average distribution for Inter-Packet arrival time from destination to source.
  • Figure 5: Temporal Distribution of Network Traffic Across Four Datasets. This figure illustrates the minute-by-minute network traffic flow for NF3-Datasets on representative days, showcasing the onset, duration, and termination of various attack classes alongside benign traffic.
  • ...and 3 more figures