State Frequency Estimation for Anomaly Detection
Clinton Cao, Agathe Blaise, Annibale Panichella, Sicco Verwer
TL;DR
SEQUENT tackles the challenge of detecting network anomalies by learning a deterministic state machine from benign NetFlow traces and adapting anomaly scores at test time via state visit frequencies. It adds interpretability through root-cause symbols that link anomalous traces to specific flows, and demonstrates superior performance over multiple unsupervised baselines on three public datasets. The approach combines discrete feature encoding, a sliding-window trace construction, and the FlexFringe learning framework to produce a compact, interpretable model whose scoring adapts with observed behavior. Empirically, SEQUENT achieves higher AUC, maintains practical runtimes, and shows robustness to several adversarial strategies, making it a promising solution for scalable, explainable NetFlow anomaly detection.
Abstract
Many works have studied the efficacy of state machines for detecting anomalies within NetFlows. These works typically learn a model from unlabeled data and compute anomaly scores for arbitrary traces based on their likelihood of occurrence or how well they fit within the model. However, these methods do not dynamically adapt their scores based on the traces seen at test time. This becomes a problem when an adversary produces seemingly common traces in their attack, causing the model to miss the detection by assigning low anomaly scores. We propose SEQUENT, a new unsupervised approach that uses the state visit frequency of a state machine to adapt its scoring dynamically for anomaly detection. SEQUENT subsequently uses the scores to generate root causes for anomalies. These allow the grouping of alarms and simplify the analysis of anomalies. We evaluate SEQUENT's effectiveness in detecting network anomalies on three publicly available NetFlow datasets and compare its performance against various existing unsupervised anomaly detection methods. Our evaluation shows promising results for using the state visit frequency of a state machine to detect network anomalies.
