GothX: a generator of customizable, legitimate and malicious IoT network traffic
Manuel Poisson, Rodrigo Carnier, Kensuke Fukuda
TL;DR
The paper tackles the challenge of producing realistic, labeled IoT traffic for ML-based anomaly detection by introducing GothX, an open-source, highly configurable traffic generator built as an improved fork of Gotham. GothX supports automated scenario execution on heterogeneous topologies (MQTT, Kafka, SINETStream), automatic labeling, and two validated use cases: replication of MQTTset and a novel six-step attack scenario leading to a DDoS, resulting in two new labeled datasets. It demonstrates scalability to about 450 IoT sensors on a single machine and discusses replication, realism, and performance trade-offs, arguing that GothX significantly advances the state-of-the-art in IoT traffic generation. The approach provides practical impact for researchers and practitioners by enabling repeatable, customizable, and labeled mixed traffic datasets for robust ML-based anomaly detection in IoT networks, with open-source availability for community use.
Abstract
In recent years, machine learning-based anomaly detection (AD) has become an important measure against security threats from Internet of Things (IoT) networks. Machine learning (ML) models for network traffic AD require datasets to be trained, evaluated and compared. Due to the necessity of realistic and up-to-date representation of IoT security threats, new datasets need to be constantly generated to train relevant AD models. Since most traffic generation setups are developed considering only the author's use, replication of traffic generation becomes an additional challenge to the creation and maintenance of useful datasets. In this work, we propose GothX, a flexible traffic generator to create both legitimate and malicious traffic for IoT datasets. As a fork of Gotham Testbed, GothX is developed with five requirements: 1)easy configuration of network topology, 2) customization of traffic parameters, 3) automatic execution of legitimate and attack scenarios, 4) IoT network heterogeneity (the current iteration supports MQTT, Kafka and SINETStream services), and 5) automatic labeling of generated datasets. GothX is validated by two use cases: a) re-generation and enrichment of traffic from the IoT dataset MQTTset,and b) automatic execution of a new realistic scenario including the exploitation of a CVE specific to the Kafka-MQTT network topology and leading to a DDoS attack. We also contribute with two datasets containing mixed traffic, one made from the enriched MQTTset traffic and another from the attack scenario. We evaluated the scalability of GothX (450 IoT sensors in a single machine), the replication of the use cases and the validity of the generated datasets, confirming the ability of GothX to improve the current state-of-the-art of network traffic generation.
