Enhancing Automata Learning with Statistical Machine Learning: A Network Security Case Study
Negin Ayoughi, Shiva Nejati, Mehrdad Sabetzadeh, Patricio Saavedra
TL;DR
The paper tackles verifying network intrusion detection systems by learning compact, interpretable state machines from numeric time-series network data. It introduces MELA, a passive automata-learning pipeline augmented with ML-based trace abstraction (variable selection via information gain and range abstraction via decision trees) to produce Moore automata that accurately reflect RRTRouter behaviours. Empirical results on a realistic testbed show substantial reductions in automata size (about 67.5% fewer states/transitions) and substantial accuracy gains (~28% over a manually abstracted baseline), enabling effective model checking and temporal query analysis for verification and exploration of unknown behaviours. The work demonstrates a practical path to interpretable behavioural modeling of cyber- intrusion detection systems in numerically-rich, real-world settings, with publicly available data and tools for replication.
Abstract
Intrusion detection systems are crucial for network security. Verification of these systems is complicated by various factors, including the heterogeneity of network platforms and the continuously changing landscape of cyber threats. In this paper, we use automata learning to derive state machines from network-traffic data with the objective of supporting behavioural verification of intrusion detection systems. The most innovative aspect of our work is addressing the inability to directly apply existing automata learning techniques to network-traffic data due to the numeric nature of such data. Specifically, we use interpretable machine learning (ML) to partition numeric ranges into intervals that strongly correlate with a system's decisions regarding intrusion detection. These intervals are subsequently used to abstract numeric ranges before automata learning. We apply our ML-enhanced automata learning approach to a commercial network intrusion detection system developed by our industry partner, RabbitRun Technologies. Our approach results in an average 67.5% reduction in the number of states and transitions of the learned state machines, while achieving an average 28% improvement in accuracy compared to using expertise-based numeric data abstraction. Furthermore, the resulting state machines help practitioners in verifying system-level security requirements and exploring previously unknown system behaviours through model checking and temporal query checking. We make our implementation and experimental data available online.
