Table of Contents
Fetching ...

A Framework for Streaming Event-Log Prediction in Business Processes

Benedikt Bollig, Matthias Függer, Thomas Nowak

TL;DR

This work tackles streaming event-log prediction in business processes by casting base language models as automata (PDFA/FDFA) and unifying them under an automata-based framework. It introduces a Python-based framework that supports both batch and streaming predictions, enabling easy composition of base models (e.g., FPT, $n$-grams, bags) and their ensemble via soft, hard, or adaptive voting. Experimental results on seven real-world datasets show that LSTMs dominate in batch, but simple models augmented with ensemble methods often match or exceed LSTM performance in streaming, with substantially lower latency. The framework thus enables real-time, robust decision support in process mining and motivates further exploration of fallback strategies, online hyperparameter tuning, and global–local behavior modeling.

Abstract

We present a Python-based framework for event-log prediction in streaming mode, enabling predictions while data is being generated by a business process. The framework allows for easy integration of streaming algorithms, including language models like n-grams and LSTMs, and for combining these predictors using ensemble methods. Using our framework, we conducted experiments on various well-known process-mining data sets and compared classical batch with streaming mode. Though, in batch mode, LSTMs generally achieve the best performance, there is often an n-gram whose accuracy comes very close. Combining basic models in ensemble methods can even outperform LSTMs. The value of basic models with respect to LSTMs becomes even more apparent in streaming mode, where LSTMs generally lack accuracy in the early stages of a prediction run, while basic methods make sensible predictions immediately.

A Framework for Streaming Event-Log Prediction in Business Processes

TL;DR

This work tackles streaming event-log prediction in business processes by casting base language models as automata (PDFA/FDFA) and unifying them under an automata-based framework. It introduces a Python-based framework that supports both batch and streaming predictions, enabling easy composition of base models (e.g., FPT, -grams, bags) and their ensemble via soft, hard, or adaptive voting. Experimental results on seven real-world datasets show that LSTMs dominate in batch, but simple models augmented with ensemble methods often match or exceed LSTM performance in streaming, with substantially lower latency. The framework thus enables real-time, robust decision support in process mining and motivates further exploration of fallback strategies, online hyperparameter tuning, and global–local behavior modeling.

Abstract

We present a Python-based framework for event-log prediction in streaming mode, enabling predictions while data is being generated by a business process. The framework allows for easy integration of streaming algorithms, including language models like n-grams and LSTMs, and for combining these predictors using ensemble methods. Using our framework, we conducted experiments on various well-known process-mining data sets and compared classical batch with streaming mode. Though, in batch mode, LSTMs generally achieve the best performance, there is often an n-gram whose accuracy comes very close. Combining basic models in ensemble methods can even outperform LSTMs. The value of basic models with respect to LSTMs becomes even more apparent in streaming mode, where LSTMs generally lack accuracy in the early stages of a prediction run, while basic methods make sensible predictions immediately.

Paper Structure

This paper contains 16 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Automata obtained for the event log given by the set $L = \{ \langle a \rangle^5, \langle aa \rangle^3, \langle aaa \rangle^3, \langle aab \rangle^1, \langle aaaa \rangle^1, \langle b \rangle^9, \langle ba \rangle^1, \langle bb \rangle^5, \langle bba \rangle^1, \langle bbb \rangle^1 \}$, with multiplicities of a sequence denoted by powers. States (nodes) and transitions upon activities (arrows) are shown. Frequencies, respectively, probabilities for activities are indicated in brackets next to activities and within nodes for the $\mathsf{stop}\xspace$ activity. (a) FPT (b) FDFA for the 3-gram, as well as (c) PDFA for the 3-gram. (d) FDFA for the 3-gram enriched with the current states of cases 123, 453, and 721 as maintained during inference and streaming learning.
  • Figure 2: Prediction accuracy for different language models in streaming mode over the number of activities in the sepsis dataset from sepsis_cases.

Theorems & Definitions (1)

  • definition 1