Control-flow anomaly detection by process mining-based feature extraction and dimensionality reduction
Francesco Vitale, Marco Pegoraro, Wil M. P. van der Aalst, Nicola Mazzocca
TL;DR
This work tackles control-flow anomalies in event logs by addressing the limitations of conformance checking with noisy data and low-quality models. It introduces a novel process mining-based feature extraction approach using alignment-based conformance checking to derive per-activity diagnostics, integrated into a framework that combines feature extraction with dimensionality reduction for reconstruction-based anomaly detection. The framework demonstrates strong, explainable performance across multiple public and real-world datasets, with best results reaching up to 97.3% F1 on PDC 2020 and 88.5% F1 on COVAS, while also explaining why traditional fitness-threshold baselines fail. The findings show that no single feature-extraction method is universally best, emphasize explainability, and point to future work on enriching data perspectives and advancing object-centric process mining.
Abstract
The business processes of organizations may deviate from normal control flow due to disruptive anomalies, including unknown, skipped, and wrongly-ordered activities. To identify these control-flow anomalies, process mining can check control-flow correctness against a reference process model through conformance checking, an explainable set of algorithms that allows linking any deviations with model elements. However, the effectiveness of conformance checking-based techniques is negatively affected by noisy event data and low-quality process models. To address these shortcomings and support the development of competitive and explainable conformance checking-based techniques for control-flow anomaly detection, we propose a novel process mining-based feature extraction approach with alignment-based conformance checking. This variant aligns the deviating control flow with a reference process model; the resulting alignment can be inspected to extract additional statistics such as the number of times a given activity caused mismatches. We integrate this approach into a flexible and explainable framework for developing techniques for control-flow anomaly detection. The framework combines process mining-based feature extraction and dimensionality reduction to handle high-dimensional feature sets, achieve detection effectiveness, and support explainability. The results show that the framework techniques implementing our approach outperform the baseline conformance checking-based techniques while maintaining the explainable nature of conformance checking. We also provide an explanation of why existing conformance checking-based techniques may be ineffective.
