Table of Contents
Fetching ...

Efficient Online Computation of Business Process State From Trace Prefixes via N-Gram Indexing

David Chapela-Campa, Marlon Dumas

TL;DR

This paper presents an online state computation method for ongoing process cases by offline-generating a complete pure reachability graph and building an $n$-gram index that maps last $n$ observable activities to process markings. At runtime, the state of a trace prefix is obtained in constant time on the trace length by looking up the ending $m$-gram(s) in the index, enabling fast log animation and short-term simulation. The approach achieves competitive accuracy with prefix-alignment while delivering scalable throughput (hundreds of thousands of traces per second) in real-life datasets, and shows robustness to noise on synthetic benchmarks. The work advances practical state computation for process mining by trading some expressiveness for substantial online efficiency and deterministic state retrieval, with clear avenues for extending to partial matching and graph-database storage.

Abstract

This paper addresses the following problem: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state (i.e., marking) in the model. This state computation operation is a building block of other process mining operations, such as log animation and short-term simulation. An approach to this state computation problem is to perform a token-based replay of each trace prefix against the model. However, when a trace prefix does not strictly follow the behavior of the process model, token replay may produce a state that is not reachable from the initial state of the process. An alternative approach is to first compute an alignment between the trace prefix of each ongoing case and the model, and then replay the aligned trace prefix. However, (prefix-)alignment is computationally expensive. This paper proposes a method that, given a trace prefix of an ongoing case, computes its state in constant time using an index that represents states as n-grams. An empirical evaluation shows that the proposed approach has an accuracy comparable to that of the prefix-alignment approach, while achieving a throughput of hundreds of thousands of traces per second.

Efficient Online Computation of Business Process State From Trace Prefixes via N-Gram Indexing

TL;DR

This paper presents an online state computation method for ongoing process cases by offline-generating a complete pure reachability graph and building an -gram index that maps last observable activities to process markings. At runtime, the state of a trace prefix is obtained in constant time on the trace length by looking up the ending -gram(s) in the index, enabling fast log animation and short-term simulation. The approach achieves competitive accuracy with prefix-alignment while delivering scalable throughput (hundreds of thousands of traces per second) in real-life datasets, and shows robustness to noise on synthetic benchmarks. The work advances practical state computation for process mining by trading some expressiveness for substantial online efficiency and deterministic state retrieval, with clear avenues for extending to partial matching and graph-database storage.

Abstract

This paper addresses the following problem: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state (i.e., marking) in the model. This state computation operation is a building block of other process mining operations, such as log animation and short-term simulation. An approach to this state computation problem is to perform a token-based replay of each trace prefix against the model. However, when a trace prefix does not strictly follow the behavior of the process model, token replay may produce a state that is not reachable from the initial state of the process. An alternative approach is to first compute an alignment between the trace prefix of each ongoing case and the model, and then replay the aligned trace prefix. However, (prefix-)alignment is computationally expensive. This paper proposes a method that, given a trace prefix of an ongoing case, computes its state in constant time using an index that represents states as n-grams. An empirical evaluation shows that the proposed approach has an accuracy comparable to that of the prefix-alignment approach, while achieving a throughput of hundreds of thousands of traces per second.
Paper Structure (12 sections, 6 figures, 8 tables, 4 algorithms)

This paper contains 12 sections, 6 figures, 8 tables, 4 algorithms.

Figures (6)

  • Figure 1: Example of token-replay (a) and short-term simulation (b) scenarios.
  • Figure 2: Workflow nets of invoicing (a) and order handling (b) processes.
  • Figure 3: Transformation of an XOR-split place connected to both silent and observable transitions into an XOR-split connected to only silent transitions.
  • Figure 4: Model with XOR-split and XOR-join structures (a) and three corresponding pure reachability graphs: eager-traversing (b), lazy-traversing (c), and lazy-splits/eager-joins traversing (d) policies.
  • Figure 5: Pure reachability graph (lazy traversal policy only for XOR-splits) corresponding to the model in \ref{['subfig:running-example']}.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Definition 1: Labeled Petri net DBLP:books/sp/Aalst16
  • Definition 2: Labeled Workflow net DBLP:books/sp/Aalst16
  • Definition 3: Reachability Graph
  • Definition 4: Complete Reachability Graph
  • Definition 5: Pure Reachability Graph
  • Definition 6: Complete Pure Reachability Graph
  • Definition 7: N-gram index