Introducing a Comprehensive, Continuous, and Collaborative Survey of Intrusion Detection Datasets
Philipp Bönninghausen, Rafael Uetz, Martin Henze
TL;DR
This paper addresses the difficulty of choosing suitable intrusion detection datasets due to the sheer number of options and their limitations. It introduces COMIDDS, a GitHub-backed, continuously updated website that provides detailed, machine-readable entries for enterprise-network intrusion datasets, including environment, data formats, labeling, and sample records, plus links to publications. The authors describe their repository-based survey methodology, how datasets are identified and analyzed, and how statistics are generated from a central CSV dataset, demonstrating improvements over static prior surveys. The work aims to improve experimental realism and reproducibility by clarifying dataset limitations and enabling rapid, criterion-based dataset selection, with ongoing expansion and community contributions. Overall, COMIDDS promises a practical, up-to-date reference that supports robust intrusion detection research in real-world, enterprise contexts.
Abstract
Researchers in the highly active field of intrusion detection largely rely on public datasets for their experimental evaluations. However, the large number of existing datasets, the discovery of previously unknown flaws therein, and the frequent publication of new datasets make it hard to select suitable options and sufficiently understand their respective limitations. Hence, there is a great risk of drawing invalid conclusions from experimental results with respect to detection performance of novel methods in the real world. While there exist various surveys on intrusion detection datasets, they have deficiencies in providing researchers with a profound decision basis since they lack comprehensiveness, actionable details, and up-to-dateness. In this paper, we present COMIDDS, an ongoing effort to comprehensively survey intrusion detection datasets with an unprecedented level of detail, implemented as a website backed by a public GitHub repository. COMIDDS allows researchers to quickly identify suitable datasets depending on their requirements and provides structured and critical information on each dataset, including actual data samples and links to relevant publications. COMIDDS is freely accessible, regularly updated, and open to contributions.
