Mind the Gap: A Formal Investigation of the Relationship Between Log and Model Complexity -- Extended Version
Patrizia Schalk, Artem Polyvyanyy
TL;DR
The paper targets a long-standing question in process mining: do formal log complexity measures predict the complexity of the models discovered from those logs? By formalizing a broad set of log and model complexity measures and evaluating them across multiple discovery algorithms (including Flower Model, Trace Net, Alpha Miner, and Directly Follows variants), it uncovers that only a limited subset of predictability exists. The Flower Model provides monotone links between log variety and several model- complexity scores, but most measures remain non-predictive for most algorithms; many cases show that log growth does not imply tougher models. These results emphasize the need for algorithm-specific guidelines and transparent reporting from discovery method developers, and the authors provide a public tool to reproduce their analyses. Overall, the work highlights gaps in theoretical foundations and offers a concrete direction for better understanding and predicting the complexity of discovered processes in process mining.
Abstract
Simple process models are key for effectively communicating the outcomes of process mining. An important question in this context is whether the complexity of event logs used as inputs to process discovery algorithms can serve as a reliable indicator of the complexity of the resulting process models. Although various complexity measures for both event logs and process models have been proposed in the literature, the relationship between input and output complexity remains largely unexplored. In particular, there are no established guidelines or theoretical foundations that explain how the complexity of an event log influences the complexity of the discovered model. This paper examines whether formal guarantees exist such that increasing the complexity of event logs leads to increased complexity in the discovered models. We study 18 log complexity measures and 17 process model complexity measures across five process discovery algorithms. Our findings reveal that only the complexity of the flower model can be established by an event log complexity measure. For all other algorithms, we investigate which log complexity measures influence the complexity of the discovered models. The results show that current log complexity measures are insufficient to decide which discovery algorithms to choose to construct simple models. We propose that authors of process discovery algorithms provide insights into which log complexity measures predict the complexity of their results.
