ASTD Patterns for Integrated Continuous Anomaly Detection In Data Logs
Chaymae El Jabri, Marc Frappier, Pierre-Martin Tardif
TL;DR
This work uses Algebraic State Transition Diagrams (ASTD) augmented with a new Quantified Flow operator to specify and execute an ensemble of unsupervised anomaly detectors over streaming data logs. By employing a sliding window for continuous learning, the approach renews training data at window boundaries and combines multiple models (K-means, KDE, LOF) via majority voting to detect anomalies per event. The contribution includes extending ASTD with Quantified Flow to enable scalable, per-model concurrency and modular composition, plus a concrete specification pattern that can be adapted to other unsupervised methods. Experiments on CERT Insider Threat data show that windowing plus ensemble fusion improves detection rates across configurations, while highlighting the importance of parameter choices for window size and renewal. Overall, the paper demonstrates how ASTD can serve as a high-level, executable framework for designing modular, reusable anomaly-detection pipelines in data streams.
Abstract
This paper investigates the use of the ASTD language for ensemble anomaly detection in data logs. It uses a sliding window technique for continuous learning in data streams, coupled with updating learning models upon the completion of each window to maintain accurate detection and align with current data trends. It proposes ASTD patterns for combining learning models, especially in the context of unsupervised learning, which is commonly used for data streams. To facilitate this, a new ASTD operator is proposed, the Quantified Flow, which enables the seamless combination of learning models while ensuring that the specification remains concise. Our contribution is a specification pattern, highlighting the capacity of ASTDs to abstract and modularize anomaly detection systems. The ASTD language provides a unique approach to develop data flow anomaly detection systems, grounded in the combination of processes through the graphical representation of the language operators. This simplifies the design task for developers, who can focus primarily on defining the functional operations that constitute the system.
