Rethinking BPS: A Utility-Based Evaluation Framework

Konrad Özdemir; Lukas Kirchdorfer; Keyvan Amiri Elyasi; Han van der Aa; Heiner Stuckenschmidt

Rethinking BPS: A Utility-Based Evaluation Framework

Konrad Özdemir, Lukas Kirchdorfer, Keyvan Amiri Elyasi, Han van der Aa, Heiner Stuckenschmidt

TL;DR

The paper critiques the standard forecasting-based evaluation of business process simulation (BPS) and the reliance on $W_1$-based metrics, which can misjudge as-is process fidelity and obscure temporal dynamics. It introduces a utility-based evaluation framework that assesses whether simulated logs preserve the predictive utility of the training data by training predictive process monitoring (PPM) models on both real and simulated logs and comparing their downstream performance. The framework defines a five-step pipeline—Splitting, Simulation, PPM Training, Hold-out Evaluation, and Utility Computation—and measures UtilityLoss as the element-wise difference between downstream-task performance on $oldsymbol{\mathcal{L}_{\mathrm{train}}}$ and $oldsymbol{\mathcal{L}_{\mathrm{sim}}}$. Empirical results across multiple logs and BPS approaches show the method can diagnose which process perspectives are affected (control-flow, resource, temporal, congestion), separate model accuracy from data complexity, and support more targeted model improvements and benchmarking.

Abstract

Business process simulation (BPS) is a key tool for analyzing and optimizing organizational workflows, supporting decision-making by estimating the impact of process changes. The reliability of such estimates depends on the ability of a BPS model to accurately mimic the process under analysis, making rigorous accuracy evaluation essential. However, the state-of-the-art approach to evaluating BPS models has two key limitations. First, it treats simulation as a forecasting problem, testing whether models can predict unseen future events. This fails to assess how well a model captures the as-is process, particularly when process behavior changes from train to test period. Thus, it becomes difficult to determine whether poor results stem from an inaccurate model or the inherent complexity of the data, such as unpredictable drift. Second, the evaluation approach strongly relies on Earth Mover's Distance-based metrics, which can obscure temporal patterns and thus yield misleading conclusions about simulation quality. To address these issues, we propose a novel framework that evaluates simulation quality based on its ability to generate representative process behavior. Instead of comparing simulated logs to future real-world executions, we evaluate whether predictive process monitoring models trained on simulated data perform comparably to those trained on real data for downstream analysis tasks. Empirical results show that our framework not only helps identify sources of discrepancies but also distinguishes between model accuracy and data complexity, offering a more meaningful way to assess BPS quality.

Rethinking BPS: A Utility-Based Evaluation Framework

TL;DR

Abstract

Rethinking BPS: A Utility-Based Evaluation Framework

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (1)