Table of Contents
Fetching ...

Stochastic Directly-Follows Process Discovery Using Grammatical Inference

Hanan Alkhammash, Artem Polyvyanyy, Alistair Moffat

TL;DR

The paper addresses the challenge of discovering compact, frequency-aware process models from event logs. It introduces Stochastic Directed Action Graphs (SDAGs) and grounds them in stochastic language theory, learning them via stochastic grammar inference with ALERGIA, enhanced by a genetic algorithm (GASPD) to optimize inference parameters. The approach yields smaller graphs that better encode observed traces and their frequencies, while enabling reasoning about trace likelihoods. This work advances process mining by providing formal semantics for SDAGs, a scalable inference framework, and practical improvements in model parsimony and fidelity, with implications for stochastic simulation and decision support in process design.

Abstract

Starting with a collection of traces generated by process executions, process discovery is the task of constructing a simple model that describes the process, where simplicity is often measured in terms of model size. The challenge of process discovery is that the process of interest is unknown, and that while the input traces constitute positive examples of process executions, no negative examples are available. Many commercial tools discover Directly-Follows Graphs, in which nodes represent the observable actions of the process, and directed arcs indicate execution order possibilities over the actions. We propose a new approach for discovering sound Directly-Follows Graphs that is grounded in grammatical inference over the input traces. To promote the discovery of small graphs that also describe the process accurately we design and evaluate a genetic algorithm that supports the convergence of the inference parameters to the areas that lead to the discovery of interesting models. Experiments over real-world datasets confirm that our new approach can construct smaller models that represent the input traces and their frequencies more accurately than the state-of-the-art technique. Reasoning over the frequencies of encoded traces also becomes possible, due to the stochastic semantics of the action graphs we propose, which, for the first time, are interpreted as models that describe the stochastic languages of action traces.

Stochastic Directly-Follows Process Discovery Using Grammatical Inference

TL;DR

The paper addresses the challenge of discovering compact, frequency-aware process models from event logs. It introduces Stochastic Directed Action Graphs (SDAGs) and grounds them in stochastic language theory, learning them via stochastic grammar inference with ALERGIA, enhanced by a genetic algorithm (GASPD) to optimize inference parameters. The approach yields smaller graphs that better encode observed traces and their frequencies, while enabling reasoning about trace likelihoods. This work advances process mining by providing formal semantics for SDAGs, a scalable inference framework, and practical improvements in model parsimony and fidelity, with implications for stochastic simulation and decision support in process design.

Abstract

Starting with a collection of traces generated by process executions, process discovery is the task of constructing a simple model that describes the process, where simplicity is often measured in terms of model size. The challenge of process discovery is that the process of interest is unknown, and that while the input traces constitute positive examples of process executions, no negative examples are available. Many commercial tools discover Directly-Follows Graphs, in which nodes represent the observable actions of the process, and directed arcs indicate execution order possibilities over the actions. We propose a new approach for discovering sound Directly-Follows Graphs that is grounded in grammatical inference over the input traces. To promote the discovery of small graphs that also describe the process accurately we design and evaluate a genetic algorithm that supports the convergence of the inference parameters to the areas that lead to the discovery of interesting models. Experiments over real-world datasets confirm that our new approach can construct smaller models that represent the input traces and their frequencies more accurately than the state-of-the-art technique. Reasoning over the frequencies of encoded traces also becomes possible, due to the stochastic semantics of the action graphs we propose, which, for the first time, are interpreted as models that describe the stochastic languages of action traces.
Paper Structure (3 sections)

This paper contains 3 sections.

Theorems & Definitions (1)

  • Definition 1.3.1: Stochastic deterministic finite automata