Table of Contents
Fetching ...

ConCap: Practical Network Traffic Generation for (ML- and) Flow-based Intrusion Detection Systems

Miel Verkerken, Laurens D'hooge, Bruno Volckaert, Filip De Turck, Giovanni Apruzzese

TL;DR

ConCap addresses the persistent data problem in NIDS research by providing an open-source, scenario-driven system that automatically generates and labels network traffic in isolated containers. It demonstrates that ConCap-produced NetFlows resemble those from real networks across a broad set of activities and can support both replicability of prior studies and development of ML-based NIDS for real-world deployments. The work also shows how ConCap enables testing against unseen threats (e.g., CVEs) and complex multi-step attack chains, while maintaining low resource overhead and high reproducibility through shareable scenario configurations. Overall, ConCap offers a practical, scalable foundation for trustworthy data curation in ML-driven intrusion detection, with broad implications for reproducibility and security assessment in diverse network environments.

Abstract

Network Intrusion Detection Systems (NIDS) have been studied in research for almost four decades. Yet, despite thousands of papers claiming scientific advances, a non-negligible number of recent works suggest that the findings of prior literature may be questionable. At the root of such a disagreement is the well-known challenge of obtaining data representative of a real-world network -- and, hence, usable for security assessments. We tackle such a challenge in this paper. We propose ConCap, a practical tool meant to facilitate experimental research on NIDS. Through ConCap, a researcher can set up an isolated and lightweight network environment and configure it to produce network-related data, such as packets or NetFlows, that are automatically labeled -- hence ready for fine-grained experiments. ConCap is rooted on open-source software and is designed to foster experimental reproducibility across the scientific community by sharing just one configuration file. Through comprehensive experiments on 10 different network activities, further expanded via in-depth analyses of 21 variants of two specific activities and of 100 repetitions of four other ones, we empirically verify that ConCap produces network data resembling that of a real-world network. We also carry out experiments on well-known benchmark datasets as well as on a real ``smart-home'' network, showing that, from a cyber-detection viewpoint, ConCap's automatically-labeled NetFlows are functionally equivalent to those collected in other environments. Finally, we show that ConCap enables to safely reproduce sophisticated attack chains (e.g., to test/enhance existing NIDS). Altogether, ConCap is a solution to the ``data problem'' that is plaguing NIDS research.

ConCap: Practical Network Traffic Generation for (ML- and) Flow-based Intrusion Detection Systems

TL;DR

ConCap addresses the persistent data problem in NIDS research by providing an open-source, scenario-driven system that automatically generates and labels network traffic in isolated containers. It demonstrates that ConCap-produced NetFlows resemble those from real networks across a broad set of activities and can support both replicability of prior studies and development of ML-based NIDS for real-world deployments. The work also shows how ConCap enables testing against unseen threats (e.g., CVEs) and complex multi-step attack chains, while maintaining low resource overhead and high reproducibility through shareable scenario configurations. Overall, ConCap offers a practical, scalable foundation for trustworthy data curation in ML-driven intrusion detection, with broad implications for reproducibility and security assessment in diverse network environments.

Abstract

Network Intrusion Detection Systems (NIDS) have been studied in research for almost four decades. Yet, despite thousands of papers claiming scientific advances, a non-negligible number of recent works suggest that the findings of prior literature may be questionable. At the root of such a disagreement is the well-known challenge of obtaining data representative of a real-world network -- and, hence, usable for security assessments. We tackle such a challenge in this paper. We propose ConCap, a practical tool meant to facilitate experimental research on NIDS. Through ConCap, a researcher can set up an isolated and lightweight network environment and configure it to produce network-related data, such as packets or NetFlows, that are automatically labeled -- hence ready for fine-grained experiments. ConCap is rooted on open-source software and is designed to foster experimental reproducibility across the scientific community by sharing just one configuration file. Through comprehensive experiments on 10 different network activities, further expanded via in-depth analyses of 21 variants of two specific activities and of 100 repetitions of four other ones, we empirically verify that ConCap produces network data resembling that of a real-world network. We also carry out experiments on well-known benchmark datasets as well as on a real ``smart-home'' network, showing that, from a cyber-detection viewpoint, ConCap's automatically-labeled NetFlows are functionally equivalent to those collected in other environments. Finally, we show that ConCap enables to safely reproduce sophisticated attack chains (e.g., to test/enhance existing NIDS). Altogether, ConCap is a solution to the ``data problem'' that is plaguing NIDS research.

Paper Structure

This paper contains 50 sections, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Exemplary deployment scenario of an NIDS.
  • Figure 2: Overview of ConCap.[left] ConCap configured with two NetFlow extractors and three scenarios, [mid] executing all scenarios simultaneously on the cluster. [right] A view of a running scenario's attacker and target pods.
  • Figure 3: Multi-target scenario with ConCap.[left] ConCap supports advanced multi-step attack chains, [mid] performed over multiple targets. [right] The traffic is captured and processed per target, enabling per-target labeling.
  • Figure 4: NetFlow feature distribution of mean packet length for traffic generated by ConCap and a pair of physical hosts.The leftmost plot is for nmap and the rightmost is for patator. Additional plots in Fig. \ref{['fig:feature_dist']}.
  • Figure 5: Using ConCap to reproduce complex attack chains envisioned in MITRE ATT&CK. The attacker first compromises an exposed target before performing lateral movement to two internal hosts.
  • ...and 1 more figures