Generating the Traces You Need: A Conditional Generative Model for Process Mining Data
Riccardo Graziosi, Massimiliano Ronzani, Andrei Buliga, Chiara Di Francescomarino, Francesco Folino, Chiara Ghidini, Francesca Meneghello, Luigi Pontieri
TL;DR
The paper tackles the challenge of generating process mining traces conditioned on control-flow and temporal features, highlighting the rigidity of existing DL-based generators. It introduces a conditional variational autoencoder (CVAE) architecture built on LSTMs that encodes traces with activities, timestamps, and payloads and decodes them conditioned on a variable $c$, optimizing the conditional ELBO $\mathcal{L}_{\text{CVAE}}(x,c,\theta,\phi)$ with latent $\boldsymbol{z}\sim q_\phi(\boldsymbol{\mathrm{z}}|x,c)$ and prior $p(\boldsymbol{\mathrm{z}}|c)$. The approach includes preprocessing to split timestamps into $T_1$ and interarrival times, along with normalization strategies, and a dual-path encoder with autoregressive decoders for activities and timestamps, trained with cyclical KL annealing. Evaluation on four real-world logs demonstrates that the CVAE achieves higher quality across temporal and control-flow metrics (RED, CTD, 2GD), better conformance to process constraints, and effective conditional control for targeted trace generation and what-if analyses.
Abstract
In recent years, trace generation has emerged as a significant challenge within the Process Mining community. Deep Learning (DL) models have demonstrated accuracy in reproducing the features of the selected processes. However, current DL generative models are limited in their ability to adapt the learned distributions to generate data samples based on specific conditions or attributes. This limitation is particularly significant because the ability to control the type of generated data can be beneficial in various contexts, enabling a focus on specific behaviours, exploration of infrequent patterns, or simulation of alternative 'what-if' scenarios. In this work, we address this challenge by introducing a conditional model for process data generation based on a conditional variational autoencoder (CVAE). Conditional models offer control over the generation process by tuning input conditional variables, enabling more targeted and controlled data generation. Unlike other domains, CVAE for process mining faces specific challenges due to the multiperspective nature of the data and the need to adhere to control-flow rules while ensuring data variability. Specifically, we focus on generating process executions conditioned on control flow and temporal features of the trace, allowing us to produce traces for specific, identified sub-processes. The generated traces are then evaluated using common metrics for generative model assessment, along with additional metrics to evaluate the quality of the conditional generation
