Table of Contents
Fetching ...

Discovering Process Models With Long-Term Dependencies While Providing Guarantees and Filtering Infrequent Behavior Patterns

Lisa Luise Mannel, Wil M. P. van der Aalst

TL;DR

The paper tackles the challenge of process discovery from event logs by enhancing Petri-net-based discovery with long-term dependency modeling while guaranteeing a minimal fitness. It introduces aggregated fitness and a combined fitness metric to better handle infrequent behavior, and couples this with a BFS-based place-selection framework that yields deadlock-free nets that still replay at least a user-specified fraction of traces. The approach demonstrates strong quantitative performance (high HM and F1) across diverse real and artificial logs, while maintaining scalability through pruning via monotonicity and depth limitations. Collectively, these contributions enable more accurate, interpretable, and robust process models that capture main process flows without overfitting to rare deviations, offering practical benefits for process analytics and compliance checking.

Abstract

In process discovery, the goal is to find, for a given event log, the model describing the underlying process. While process models can be represented in a variety of ways, Petri nets form a theoretically well-explored description language and are therefore often used. In this paper, we extend the eST-Miner process discovery algorithm. The eST-Miner computes a set of Petri net places which are considered to be fitting with respect to a certain fraction of the behavior described by the given event log as indicated by a given noise threshold. It evaluates all possible candidate places using token-based replay. The set of replayable traces is determined for each place in isolation, i.e., these sets do not need to be consistent. This allows the algorithm to abstract from infrequent behavioral patterns occurring only in some traces. However, when combining places into a Petri net by connecting them to the corresponding uniquely labeled transitions, the resulting net can replay exactly those traces from the event log that are allowed by the combination of all inserted places. Thus, inserting places one-by-one without considering their combined effect may result in deadlocks and low fitness of the Petri net. In this paper, we explore adaptions of the eST-Miner, that aim to select a subset of places such that the resulting Petri net guarantees a definable minimal fitness while maintaining high precision with respect to the input event log. Furthermore, current place evaluation techniques tend to block the execution of infrequent activity labels. Thus, a refined place fitness metric is introduced and thoroughly investigated. In our experiments we use real and artificial event logs to evaluate and compare the impact of the various place selection strategies and place fitness evaluation metrics on the returned Petri net.

Discovering Process Models With Long-Term Dependencies While Providing Guarantees and Filtering Infrequent Behavior Patterns

TL;DR

The paper tackles the challenge of process discovery from event logs by enhancing Petri-net-based discovery with long-term dependency modeling while guaranteeing a minimal fitness. It introduces aggregated fitness and a combined fitness metric to better handle infrequent behavior, and couples this with a BFS-based place-selection framework that yields deadlock-free nets that still replay at least a user-specified fraction of traces. The approach demonstrates strong quantitative performance (high HM and F1) across diverse real and artificial logs, while maintaining scalability through pruning via monotonicity and depth limitations. Collectively, these contributions enable more accurate, interpretable, and robust process models that capture main process flows without overfitting to rare deviations, offering practical benefits for process analytics and compliance checking.

Abstract

In process discovery, the goal is to find, for a given event log, the model describing the underlying process. While process models can be represented in a variety of ways, Petri nets form a theoretically well-explored description language and are therefore often used. In this paper, we extend the eST-Miner process discovery algorithm. The eST-Miner computes a set of Petri net places which are considered to be fitting with respect to a certain fraction of the behavior described by the given event log as indicated by a given noise threshold. It evaluates all possible candidate places using token-based replay. The set of replayable traces is determined for each place in isolation, i.e., these sets do not need to be consistent. This allows the algorithm to abstract from infrequent behavioral patterns occurring only in some traces. However, when combining places into a Petri net by connecting them to the corresponding uniquely labeled transitions, the resulting net can replay exactly those traces from the event log that are allowed by the combination of all inserted places. Thus, inserting places one-by-one without considering their combined effect may result in deadlocks and low fitness of the Petri net. In this paper, we explore adaptions of the eST-Miner, that aim to select a subset of places such that the resulting Petri net guarantees a definable minimal fitness while maintaining high precision with respect to the input event log. Furthermore, current place evaluation techniques tend to block the execution of infrequent activity labels. Thus, a refined place fitness metric is introduced and thoroughly investigated. In our experiments we use real and artificial event logs to evaluate and compare the impact of the various place selection strategies and place fitness evaluation metrics on the returned Petri net.
Paper Structure (16 sections, 5 theorems, 20 equations, 24 figures, 3 tables)

This paper contains 16 sections, 5 theorems, 20 equations, 24 figures, 3 tables.

Key Result

Theorem 3.7

Consider two places $p_1 =(I_1 \mid O_1)$ and $p_2 =(I_2 \mid O_2)$. Then the following holds for any event log $L \in \mathbb{M}({\mathcal{T}\xspace})\xspace$:

Figures (24)

  • Figure 1: The behavior in event log $L$ corresponds in large parts to the sequential Petri net below. However, in all traces some deviations in activity order occur (marked in red). Since all traces and all activities are equally frequent, it is not possible to filter infrequent behavior patterns and discover the underlying main process structure by simply removing infrequent traces or activities in a preprocessing step. This becomes even more challenging for processes that include concurrency, choice or non-free choice constructs.
  • Figure 2: Consider the event log $L=[\langle {\blacktriangleright}\xspace, a, b, \mathsmaller{\blacksquare}\xspace \rangle^{40}, \langle {\blacktriangleright}\xspace, b, a, \mathsmaller{\blacksquare}\xspace \rangle^{60}]$, where the first trace variant occurs $40$ times and the second one $60$ times. Considered in isolation, place $p_6$ allows for the first sequence of activities while place $p_7$ allows for the second. However, in combination they cause a deadlock in the Petri net.
  • Figure 3: Petri net with $A = \{{\blacktriangleright}\xspace, a,b,c,\mathsmaller{\blacksquare}\xspace \}$ and $P=\{({{\blacktriangleright}\xspace} | {a,b})\xspace, ({{\blacktriangleright}\xspace} | {a,c})\xspace, ({a,b} | {\mathsmaller{\blacksquare}\xspace})\xspace, ({a,c} | {\mathsmaller{\blacksquare}\xspace})\xspace, ({\emptyset} | {{\blacktriangleright}\xspace})\xspace, ({\mathsmaller{\blacksquare}\xspace} | {\emptyset})\xspace\}$.
  • Figure 4: Example of a tree-structured candidate space for the set of activities $\{ {{\blacktriangleright}\xspace, a, b, \mathsmaller{\blacksquare}\xspace} \}$, with orderings ${\mathsmaller{\blacksquare}\xspace >_i b >_i a >_i {\blacktriangleright}\xspace}$ and ${\mathsmaller{\blacksquare}\xspace >_o b >_o a >_o {\blacktriangleright}\xspace}$.
  • Figure 5: Illustrating the fitness status a candidate place can have with respect to an event log and noise threshold. A place can either be fitting or unfitting. If it is unfitting, it may be underfed, overfed or both, enabling the skipping of derived candidate places. It may also be unfitting (Definition \ref{['def:fitnessthreshold']}) without satisfying the threshold to be underfed or overfed (Definition \ref{['def:fednessFM']}), in which case no candidates can be skipped based on it.
  • ...and 19 more figures

Theorems & Definitions (25)

  • Definition 2.1: Activity, Trace, Event Log
  • Definition 2.2: Places and Petri nets
  • Definition 2.3: Replayable Traces
  • Definition 2.4: Behavior of a Petri net
  • Definition 3.1: Fitting, Underfed and Overfed Places, cf. monotonicity
  • Definition 3.2: Multisets of Fitting/Underfed/Overfed Traces (compare monotonicity)
  • Definition 3.3: Measuring Fitness of a Place
  • Definition 3.4: Activated Traces, cf. monotonicity
  • Definition 3.5: Fitness Metrics, cf. monotonicity
  • Definition 3.6: Fitness with Respect to a Threshold, compare monotonicity
  • ...and 15 more