Table of Contents
Fetching ...

Control-flow anomaly detection by process mining-based feature extraction and dimensionality reduction

Francesco Vitale, Marco Pegoraro, Wil M. P. van der Aalst, Nicola Mazzocca

TL;DR

This work tackles control-flow anomalies in event logs by addressing the limitations of conformance checking with noisy data and low-quality models. It introduces a novel process mining-based feature extraction approach using alignment-based conformance checking to derive per-activity diagnostics, integrated into a framework that combines feature extraction with dimensionality reduction for reconstruction-based anomaly detection. The framework demonstrates strong, explainable performance across multiple public and real-world datasets, with best results reaching up to 97.3% F1 on PDC 2020 and 88.5% F1 on COVAS, while also explaining why traditional fitness-threshold baselines fail. The findings show that no single feature-extraction method is universally best, emphasize explainability, and point to future work on enriching data perspectives and advancing object-centric process mining.

Abstract

The business processes of organizations may deviate from normal control flow due to disruptive anomalies, including unknown, skipped, and wrongly-ordered activities. To identify these control-flow anomalies, process mining can check control-flow correctness against a reference process model through conformance checking, an explainable set of algorithms that allows linking any deviations with model elements. However, the effectiveness of conformance checking-based techniques is negatively affected by noisy event data and low-quality process models. To address these shortcomings and support the development of competitive and explainable conformance checking-based techniques for control-flow anomaly detection, we propose a novel process mining-based feature extraction approach with alignment-based conformance checking. This variant aligns the deviating control flow with a reference process model; the resulting alignment can be inspected to extract additional statistics such as the number of times a given activity caused mismatches. We integrate this approach into a flexible and explainable framework for developing techniques for control-flow anomaly detection. The framework combines process mining-based feature extraction and dimensionality reduction to handle high-dimensional feature sets, achieve detection effectiveness, and support explainability. The results show that the framework techniques implementing our approach outperform the baseline conformance checking-based techniques while maintaining the explainable nature of conformance checking. We also provide an explanation of why existing conformance checking-based techniques may be ineffective.

Control-flow anomaly detection by process mining-based feature extraction and dimensionality reduction

TL;DR

This work tackles control-flow anomalies in event logs by addressing the limitations of conformance checking with noisy data and low-quality models. It introduces a novel process mining-based feature extraction approach using alignment-based conformance checking to derive per-activity diagnostics, integrated into a framework that combines feature extraction with dimensionality reduction for reconstruction-based anomaly detection. The framework demonstrates strong, explainable performance across multiple public and real-world datasets, with best results reaching up to 97.3% F1 on PDC 2020 and 88.5% F1 on COVAS, while also explaining why traditional fitness-threshold baselines fail. The findings show that no single feature-extraction method is universally best, emphasize explainability, and point to future work on enriching data perspectives and advancing object-centric process mining.

Abstract

The business processes of organizations may deviate from normal control flow due to disruptive anomalies, including unknown, skipped, and wrongly-ordered activities. To identify these control-flow anomalies, process mining can check control-flow correctness against a reference process model through conformance checking, an explainable set of algorithms that allows linking any deviations with model elements. However, the effectiveness of conformance checking-based techniques is negatively affected by noisy event data and low-quality process models. To address these shortcomings and support the development of competitive and explainable conformance checking-based techniques for control-flow anomaly detection, we propose a novel process mining-based feature extraction approach with alignment-based conformance checking. This variant aligns the deviating control flow with a reference process model; the resulting alignment can be inspected to extract additional statistics such as the number of times a given activity caused mismatches. We integrate this approach into a flexible and explainable framework for developing techniques for control-flow anomaly detection. The framework combines process mining-based feature extraction and dimensionality reduction to handle high-dimensional feature sets, achieve detection effectiveness, and support explainability. The results show that the framework techniques implementing our approach outperform the baseline conformance checking-based techniques while maintaining the explainable nature of conformance checking. We also provide an explanation of why existing conformance checking-based techniques may be ineffective.

Paper Structure

This paper contains 47 sections, 10 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: An example event log footprint with six traces, of which five exhibit control-flow anomalies.
  • Figure 2: A high-level view of the proposed framework for combining process mining-based feature extraction with dimensionality reduction for control-flow anomaly detection.
  • Figure 3: The distribution of fitness values of case-study traces of a normal event log replayed on a reference Petri net. The inconsistent distribution of fitness values of the case-study traces makes fitness thresholding ineffective for control-flow anomaly detection.
  • Figure 4: Generation of alignment-based conformance checking diagnoses by replaying the $n$-tuple of event logs $\mathcal{L}$ against a Petri net $N$.
  • Figure 5: The connection between alignment-based conformance checking diagnoses of three traces and the reference Petri net of one of the datasets we use in Section \ref{['sec:evaluation']}.
  • ...and 4 more figures

Theorems & Definitions (12)

  • Definition 3.1: Event log, sublog and trace
  • Definition 3.2: Petri net
  • Definition 3.3: Fitness
  • Definition 3.4: Alignment and moves
  • Definition 3.5: Alignment-based fitness
  • Definition 3.6: Per-activity cost
  • Definition 3.7: Alignment-based conformance checking diagnoses
  • Definition 4.1: Process mining-based feature extraction
  • Definition 4.2: Dimensionality reduction technique
  • Definition 4.3: Anomaly detection
  • ...and 2 more