Synthetic Embedding of Hidden Information in Industrial Control System Network Protocols for Evaluation of Steganographic Malware
Tom Neubert, Bjarne Peuker, Laura Buxhoidt, Eric Schueler, Claus Vielhauer
TL;DR
This work tackles the shortage of realistic steganographic ICS data by introducing the Synthetic Steganographic Embedding (SSE) concept, which generates synthetic steganographic network data from uncompromised ICS traces. It offers two embedding options: SEO_A for ultra-fast embedding using hexdump manipulation and SEO_B for more structured embedding via json representations, both integrated into a four-segment workflow (record, embedding, and retrieval). The SSE approach significantly increases embedding pace compared with prior methods such as ARES21 and enables large-scale data generation to train and evaluate defenses against steganographic malware in ICS/OT-IT environments. By preserving the rest of the network traffic, SSE maintains plausible data conditions for defense testing, supporting practical improvement of detection and assessment tools for industrial networks.
Abstract
For the last several years, the embedding of hidden information by steganographic techniques in network communications is increasingly used by attackers in order to obscure data infiltration, exfiltration or command and control in IT (information technology) and OT (operational technology) systems. Especially industrial control systems (ICS) and critical infrastructures have increased protection requirements. Currently, network defense mechanisms are unfortunately quite ineffective against novel attacks based on network steganography. Thus, on the one hand huge amounts of network data with steganographic embedding is required to train, evaluate and improve defense mechanisms. On the other hand, the real-time embedding of hidden information in productive ICS networks is crucial due to safety violations. Additionally it is time consuming because it needs special laboratory setup. To address this challenge, this work introduces an embedding concept to gene ate synthetic steganographic network data to automatically produce significant amounts of data for training and evaluation of defense mechanisms. The concept enables the possibility to manipulate a network packet wherever required and outperforms the state-of-the-art in terms of embedding pace significantly.
