Table of Contents
Fetching ...

Ranking the Top-K Realizations of Stochastically Known Event Logs

Arvid Lepsien, Marco Pegoraro, Frederik Fonger, Dominic Langhammer, Milda Aleknonytė-Resch, Agnes Koschmider

TL;DR

The paper tackles uncertainty in event logs by formalizing stochastically known logs that encode multiple realizations and proposing an efficient top-$K$ realization ranking under event independence. It introduces a BST-based algorithm with ALG-1P and ALG-R2P to enumerate the $K$ most probable realizations, achieving a theoretical complexity of $O(K \cdot |\tilde{L}|)$. Empirical evaluations show that top-$K$ realizations cover substantially more probability mass than the top-1 interpretation, particularly for small-to-moderate logs, while remaining scalable to larger logs. The findings support integrating top-$K$ rankings into uncertainty-aware process mining, with future work aimed at handling dependencies and diversifying outputs to further improve interpretability and applicability.

Abstract

Various kinds of uncertainty can occur in event logs, e.g., due to flawed recording, data quality issues, or the use of probabilistic models for activity recognition. Stochastically known event logs make these uncertainties transparent by encoding multiple possible realizations for events. However, the number of realizations encoded by a stochastically known log grows exponentially with its size, making exhaustive exploration infeasible even for moderately sized event logs. Thus, considering only the top-K most probable realizations has been proposed in the literature. In this paper, we implement an efficient algorithm to calculate a top-K realization ranking of an event log under event independence within O(Kn), where n is the number of uncertain events in the log. This algorithm is used to investigate the benefit of top-K rankings over top-1 interpretations of stochastically known event logs. Specifically, we analyze the usefulness of top-K rankings against different properties of the input data. We show that the benefit of a top-K ranking depends on the length of the input event log and the distribution of the event probabilities. The results highlight the potential of top-K rankings to enhance uncertainty-aware process mining techniques.

Ranking the Top-K Realizations of Stochastically Known Event Logs

TL;DR

The paper tackles uncertainty in event logs by formalizing stochastically known logs that encode multiple realizations and proposing an efficient top- realization ranking under event independence. It introduces a BST-based algorithm with ALG-1P and ALG-R2P to enumerate the most probable realizations, achieving a theoretical complexity of . Empirical evaluations show that top- realizations cover substantially more probability mass than the top-1 interpretation, particularly for small-to-moderate logs, while remaining scalable to larger logs. The findings support integrating top- rankings into uncertainty-aware process mining, with future work aimed at handling dependencies and diversifying outputs to further improve interpretability and applicability.

Abstract

Various kinds of uncertainty can occur in event logs, e.g., due to flawed recording, data quality issues, or the use of probabilistic models for activity recognition. Stochastically known event logs make these uncertainties transparent by encoding multiple possible realizations for events. However, the number of realizations encoded by a stochastically known log grows exponentially with its size, making exhaustive exploration infeasible even for moderately sized event logs. Thus, considering only the top-K most probable realizations has been proposed in the literature. In this paper, we implement an efficient algorithm to calculate a top-K realization ranking of an event log under event independence within O(Kn), where n is the number of uncertain events in the log. This algorithm is used to investigate the benefit of top-K rankings over top-1 interpretations of stochastically known event logs. Specifically, we analyze the usefulness of top-K rankings against different properties of the input data. We show that the benefit of a top-K ranking depends on the length of the input event log and the distribution of the event probabilities. The results highlight the potential of top-K rankings to enhance uncertainty-aware process mining techniques.
Paper Structure (12 sections, 4 theorems, 5 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 4 theorems, 5 figures, 2 tables, 1 algorithm.

Key Result

lemma 1

For all $\mathcal{D}_{I, O} \subseteq \mathcal{D}$, if a second best solution $L_q^2 \in \mathcal{D}_{I, O}$ with $q \in \{1, \ldots, K\}$ exists, there always exists a second best solution $L' \in \mathcal{D}_{I, O}$ that is different from the best solution $L_q \in \mathcal{D}_{I, O}$ in exactly o

Figures (5)

  • Figure 1: Ranking measures $P(L_1)$, $F_K(K)$, $t$ and $d_{\mathit{avg}}$ for varying $K$. ($n_{\mathit{events}} = 100,\, r = 0.3,\, n_{\mathit{act}} = 3$ and $\beta = 0.3$; log-scaled y-axis for $P(L_1)$ and $F_K(K)$)
  • Figure 2: Ranking measures $P(L_1)$, $F_K(K)$, $t$ and $d_{\mathit{avg}}$ for varying $n_{\mathit{events}}$. ($r = 0.3,\, n_{\mathit{act}} = 3,\, \beta = 0.3$ and $K = 10^4$; log-scaled y-axis for $P(L_1)$ and $F_K(K)$)
  • Figure 3: Ranking measures $P(L_1)$, $F_K(K)$, $t$ and $d_{\mathit{avg}}$ for varying $n_{\mathit{act}}$. ($n_{\mathit{events}} = 100,\, r = 0.3,\, \beta = 0.3$ and $K = 10^4$; log-scaled y-axis for $P(L_1)$ and $F_K(K)$)
  • Figure 4: Ranking measures $P(L_1)$, $F_K(K)$, $t$ and $d_{\mathit{avg}}$ for varying $\beta$. ($n_{\mathit{events}} = 100,\, r = 0.3,\, n_{\mathit{act}} = 3$ and $K = 10^4$; log-scaled y-axis for $P(L_1)$ and $F_K(K)$)
  • Figure : ALG-R2P

Theorems & Definitions (17)

  • definition 1: Universes pegoraro_conformance_2021
  • definition 2: Event, event log pegoraro_conformance_2021
  • definition 3: Stochastically known event, stochastically known event log pegoraro_conformance_2021
  • definition 4: Realizations
  • definition 5: Realization probability
  • definition 6: Top-$K$ ranking
  • definition 7: Feasible solutions hamacher_k_1985
  • definition 8: Realization ranking problem
  • lemma 1
  • proof
  • ...and 7 more