Table of Contents
Fetching ...

Model-driven Stochastic Trace Clustering

Jari Peeperkorn, Johannes De Smedt, Jochen De Weerdt

TL;DR

<3-5 sentence high-level summary> The paper tackles the problem of exploding model complexity in process mining due to high variability by introducing model-driven stochastic trace clustering, which optimizes stochastic conformance within clusters using entropic relevance. It presents Entropic Clustering (EC), its initialization variants, and ECsplit, showing that clustering traces by probabilistic behavior yields simpler, more interpretable Directly-Follows Graphs and cluster models, with linear scalability. Extensive experiments on eight real-life logs demonstrate that EC methods improve stochastic coherence and graph clarity, though non-stochastic fitness can be mixed, highlighting a trade-off depending on the downstream analysis goal. The approach is particularly valuable for stochastic process analysis and DFG-centric tooling, offering a practical extension to conformance checking and visualization in large-scale event logs.

Abstract

Process discovery algorithms automatically extract process models from event logs, but high variability often results in complex and hard-to-understand models. To mitigate this issue, trace clustering techniques group process executions into clusters, each represented by a simpler and more understandable process model. Model-driven trace clustering improves on this by assigning traces to clusters based on their conformity to cluster-specific process models. However, most existing clustering techniques rely on either no process model discovery, or non-stochastic models, neglecting the frequency or probability of activities and transitions, thereby limiting their capability to capture real-world execution dynamics. We propose a novel model-driven trace clustering method that optimizes stochastic process models within each cluster. Our approach uses entropic relevance, a stochastic conformance metric based on directly-follows probabilities, to guide trace assignment. This allows clustering decisions to consider both structural alignment with a cluster's process model and the likelihood that a trace originates from a given stochastic process model. The method is computationally efficient, scales linearly with input size, and improves model interpretability by producing clusters with clearer control-flow patterns. Extensive experiments on public real-life datasets demonstrate that while our method yields superior stochastic coherence and graph simplicity, traditional fitness metrics reveal a trade-off, highlighting the specific utility of our approach for stochastic process analysis.

Model-driven Stochastic Trace Clustering

TL;DR

<3-5 sentence high-level summary> The paper tackles the problem of exploding model complexity in process mining due to high variability by introducing model-driven stochastic trace clustering, which optimizes stochastic conformance within clusters using entropic relevance. It presents Entropic Clustering (EC), its initialization variants, and ECsplit, showing that clustering traces by probabilistic behavior yields simpler, more interpretable Directly-Follows Graphs and cluster models, with linear scalability. Extensive experiments on eight real-life logs demonstrate that EC methods improve stochastic coherence and graph clarity, though non-stochastic fitness can be mixed, highlighting a trade-off depending on the downstream analysis goal. The approach is particularly valuable for stochastic process analysis and DFG-centric tooling, offering a practical extension to conformance checking and visualization in large-scale event logs.

Abstract

Process discovery algorithms automatically extract process models from event logs, but high variability often results in complex and hard-to-understand models. To mitigate this issue, trace clustering techniques group process executions into clusters, each represented by a simpler and more understandable process model. Model-driven trace clustering improves on this by assigning traces to clusters based on their conformity to cluster-specific process models. However, most existing clustering techniques rely on either no process model discovery, or non-stochastic models, neglecting the frequency or probability of activities and transitions, thereby limiting their capability to capture real-world execution dynamics. We propose a novel model-driven trace clustering method that optimizes stochastic process models within each cluster. Our approach uses entropic relevance, a stochastic conformance metric based on directly-follows probabilities, to guide trace assignment. This allows clustering decisions to consider both structural alignment with a cluster's process model and the likelihood that a trace originates from a given stochastic process model. The method is computationally efficient, scales linearly with input size, and improves model interpretability by producing clusters with clearer control-flow patterns. Extensive experiments on public real-life datasets demonstrate that while our method yields superior stochastic coherence and graph simplicity, traditional fitness metrics reveal a trade-off, highlighting the specific utility of our approach for stochastic process analysis.

Paper Structure

This paper contains 20 sections, 3 equations, 7 figures, 10 tables, 4 algorithms.

Figures (7)

  • Figure 1: Illustrative example of how the method selects the cluster.
  • Figure 2: Illustrative example showing how loops can make the ER of a single-trace DFG non-zero.
  • Figure 3: High-level overview of the experimental setup.
  • Figure 4: Critical difference diagrams for different metrics.
  • Figure 5: The elbow experiment with (simplified) ER, averaged per trace.
  • ...and 2 more figures