ConCap: Practical Network Traffic Generation for (ML- and) Flow-based Intrusion Detection Systems
Miel Verkerken, Laurens D'hooge, Bruno Volckaert, Filip De Turck, Giovanni Apruzzese
TL;DR
ConCap addresses the persistent data problem in NIDS research by providing an open-source, scenario-driven system that automatically generates and labels network traffic in isolated containers. It demonstrates that ConCap-produced NetFlows resemble those from real networks across a broad set of activities and can support both replicability of prior studies and development of ML-based NIDS for real-world deployments. The work also shows how ConCap enables testing against unseen threats (e.g., CVEs) and complex multi-step attack chains, while maintaining low resource overhead and high reproducibility through shareable scenario configurations. Overall, ConCap offers a practical, scalable foundation for trustworthy data curation in ML-driven intrusion detection, with broad implications for reproducibility and security assessment in diverse network environments.
Abstract
Network Intrusion Detection Systems (NIDS) have been studied in research for almost four decades. Yet, despite thousands of papers claiming scientific advances, a non-negligible number of recent works suggest that the findings of prior literature may be questionable. At the root of such a disagreement is the well-known challenge of obtaining data representative of a real-world network -- and, hence, usable for security assessments. We tackle such a challenge in this paper. We propose ConCap, a practical tool meant to facilitate experimental research on NIDS. Through ConCap, a researcher can set up an isolated and lightweight network environment and configure it to produce network-related data, such as packets or NetFlows, that are automatically labeled -- hence ready for fine-grained experiments. ConCap is rooted on open-source software and is designed to foster experimental reproducibility across the scientific community by sharing just one configuration file. Through comprehensive experiments on 10 different network activities, further expanded via in-depth analyses of 21 variants of two specific activities and of 100 repetitions of four other ones, we empirically verify that ConCap produces network data resembling that of a real-world network. We also carry out experiments on well-known benchmark datasets as well as on a real ``smart-home'' network, showing that, from a cyber-detection viewpoint, ConCap's automatically-labeled NetFlows are functionally equivalent to those collected in other environments. Finally, we show that ConCap enables to safely reproduce sophisticated attack chains (e.g., to test/enhance existing NIDS). Altogether, ConCap is a solution to the ``data problem'' that is plaguing NIDS research.
