Efficient Online Computation of Business Process State From Trace Prefixes via N-Gram Indexing
David Chapela-Campa, Marlon Dumas
TL;DR
This paper presents an online state computation method for ongoing process cases by offline-generating a complete pure reachability graph and building an $n$-gram index that maps last $n$ observable activities to process markings. At runtime, the state of a trace prefix is obtained in constant time on the trace length by looking up the ending $m$-gram(s) in the index, enabling fast log animation and short-term simulation. The approach achieves competitive accuracy with prefix-alignment while delivering scalable throughput (hundreds of thousands of traces per second) in real-life datasets, and shows robustness to noise on synthetic benchmarks. The work advances practical state computation for process mining by trading some expressiveness for substantial online efficiency and deterministic state retrieval, with clear avenues for extending to partial matching and graph-database storage.
Abstract
This paper addresses the following problem: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state (i.e., marking) in the model. This state computation operation is a building block of other process mining operations, such as log animation and short-term simulation. An approach to this state computation problem is to perform a token-based replay of each trace prefix against the model. However, when a trace prefix does not strictly follow the behavior of the process model, token replay may produce a state that is not reachable from the initial state of the process. An alternative approach is to first compute an alignment between the trace prefix of each ongoing case and the model, and then replay the aligned trace prefix. However, (prefix-)alignment is computationally expensive. This paper proposes a method that, given a trace prefix of an ongoing case, computes its state in constant time using an index that represents states as n-grams. An empirical evaluation shows that the proposed approach has an accuracy comparable to that of the prefix-alignment approach, while achieving a throughput of hundreds of thousands of traces per second.
