Performance Analysis: Discovering Semi-Markov Models From Event Logs

Anna Kalenkova; Lewis Mitchell; Matthew Roughan

Performance Analysis: Discovering Semi-Markov Models From Event Logs

Anna Kalenkova, Lewis Mitchell, Matthew Roughan

TL;DR

The paper addresses the challenge of evaluating performance of processes discovered from event logs without resorting to simulation by introducing semi-Markov models that accommodate arbitrary waiting-time distributions. It develops two analytical tracks: express analysis for fast mean-time estimation and full analysis for full PDFs of total execution time, using continuous Gaussian mixtures or discrete histograms to model per-transition times. The authors prove methods for computing mean times from limiting probabilities and for building PDFs via a reduction/convolution framework, and validate them on real-world logs (BPI'13, DD'20, RFP'20), showing that discrete approaches are faster for small supports while continuous GMMs yield compact, interpretable models and competitive accuracy. The work enables practical what-if analyses and offers scalable alternatives to simulation, with demonstrated performance on large-scale incident and claim-handling datasets.

Abstract

Process mining is a well-established discipline of data analysis focused on the discovery of process models from information systems' event logs. Recently, an emerging subarea of process mining, known as stochastic process discovery, has started to evolve. Stochastic process discovery considers frequencies of events in the event data and allows for a more comprehensive analysis. In particular, when the durations of activities are presented in the event log, performance characteristics of the discovered stochastic models can be analyzed, e.g., the overall process execution time can be estimated. Existing performance analysis techniques usually discover stochastic process models from event data, and then simulate these models to evaluate their execution times. These methods rely on empirical approaches. This paper proposes analytical techniques for performance analysis that allow for the derivation of statistical characteristics of the overall processes' execution times in the presence of arbitrary time distributions of events modeled by semi-Markov processes. The proposed methods include express analysis, focused on the mean execution time estimation, and full analysis techniques that build probability density functions (PDFs) of process execution times in both continuous and discrete forms. These methods are implemented and tested on real-world event data, demonstrating their potential for what-if analysis by providing solutions without resorting to simulation. Specifically, we demonstrated that the discrete approach is more time-efficient for small duration support sizes compared to the simulation technique. Furthermore, we showed that the continuous approach, with PDFs represented as Mixtures of Gaussian Models (GMMs), facilitates the discovery of more compact and interpretable models.

Performance Analysis: Discovering Semi-Markov Models From Event Logs

TL;DR

Abstract

Paper Structure (17 sections, 4 theorems, 14 equations, 18 figures, 7 tables, 1 algorithm)

This paper contains 17 sections, 4 theorems, 14 equations, 18 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Process Discovery
Performance Analysis
Express analysis. Deriving Mean of Process Execution Time
Full Analysis Of The Process Execution Time
Continuous Time Distributions
Discrete Time Distributions
Case Studies
Applying Express Analysis to Real-World Data
Applying Full Analysis to Real-World Event Data
Deriving Time Distributions
Relating Discovered and Observed Process Execution Times
Conclusion
...and 2 more sections

Key Result

Theorem 1

For an irreducible Markov chain with period $d$, limiting probabilities$\pi_j=\frac{1}{d}\lim\limits_{n\to\infty} p^{dn}_{i,j}$ exist for all states $j$, and are independent of initial states $i$. The $\pi_j$ are the unique non-negative solutions of $\pi_j=\sum_{i=1}^m \pi_i p_{i,j}$ such that $\sum

Figures (18)

Figure 1: A Markov chain with three states.
Figure 2: An aperiodic Markov chain.
Figure 3: A Markov process flow.
Figure 4: A Markov process flow discovered from event log $L$ with $k$ set to 1.
Figure 5: A Markov process flow discovered from $L$ with $k$ set to 2.
...and 13 more figures

Theorems & Definitions (23)

Definition 1: Event log
Definition 2: Trace of event log
Definition 3: Trace representation of event log
Definition 4: Activity/time traces
Definition 5: Subtrace, element and trace length
Definition 6: Finite-state Markov chain
Definition 7: Irreducible Markov chain
Definition 8: State period
Definition 9: Aperiodic Markov chain
Theorem 1: Limiting probabilities Ross85
...and 13 more

Performance Analysis: Discovering Semi-Markov Models From Event Logs

TL;DR

Abstract

Performance Analysis: Discovering Semi-Markov Models From Event Logs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (23)