Table of Contents
Fetching ...

An Experimental Comparison of Alternative Techniques for Event-Log Augmentation

Alessandro Padella, Francesco Vinci, Massimiliano de Leoni

TL;DR

Process mining relies on event logs, yet data scarcity hampers machine- and deep-learning approaches. The authors perform an extensive empirical comparison of seven event-log augmentation techniques against a baseline based on a probabilistic transition system, across eight logs, evaluating similarity, predictive preservation, entropy-driven diversity, and computation time. They find that the baseline and RIMS excel in different facets—baseline in control-flow, resources, and speed, and RIMS in time- and congestion-related accuracy—while CVAE boosts control-flow similarity at the cost of computation, and SMOTE generally underperforms due to ignoring process constraints. The study suggests that combining fast, accurate control-flow generation with detailed resource-time modeling could yield high-fidelity synthetic logs with strong utility for predictive process monitoring.

Abstract

Process mining analyzes and improves processes by examining transactional data stored in event logs, which record sequences of events with timestamps. However, the effectiveness of process mining, especially when combined with machine or deep learning, depends on having large event logs. Event log augmentation addresses this limitation by generating additional traces that simulate realistic process executions while considering various perspectives like time, control-flow, workflow, resources, and domain-specific attributes. Although prior research has explored event-log augmentation techniques, there has been no comprehensive comparison of their effectiveness. This paper reports on an evaluation of seven state-of-the-art augmentation techniques across eight event logs. The results are also compared with those obtained by a baseline technique based on a stochastic transition system. The comparison has been carried on analyzing four different aspects: similarity, preservation of predictive information, information loss/enhancement, and computational times required. Results show that, considering the different criteria, a technique based on a stochastic transition system combined with resource queue modeling would provide higher quality synthetic event logs. Event-log augmentation techniques are also compared with traditional data-augmentation techniques, showing that the former provide significant benefits, whereas the latter fail to consider process constraints.

An Experimental Comparison of Alternative Techniques for Event-Log Augmentation

TL;DR

Process mining relies on event logs, yet data scarcity hampers machine- and deep-learning approaches. The authors perform an extensive empirical comparison of seven event-log augmentation techniques against a baseline based on a probabilistic transition system, across eight logs, evaluating similarity, predictive preservation, entropy-driven diversity, and computation time. They find that the baseline and RIMS excel in different facets—baseline in control-flow, resources, and speed, and RIMS in time- and congestion-related accuracy—while CVAE boosts control-flow similarity at the cost of computation, and SMOTE generally underperforms due to ignoring process constraints. The study suggests that combining fast, accurate control-flow generation with detailed resource-time modeling could yield high-fidelity synthetic logs with strong utility for predictive process monitoring.

Abstract

Process mining analyzes and improves processes by examining transactional data stored in event logs, which record sequences of events with timestamps. However, the effectiveness of process mining, especially when combined with machine or deep learning, depends on having large event logs. Event log augmentation addresses this limitation by generating additional traces that simulate realistic process executions while considering various perspectives like time, control-flow, workflow, resources, and domain-specific attributes. Although prior research has explored event-log augmentation techniques, there has been no comprehensive comparison of their effectiveness. This paper reports on an evaluation of seven state-of-the-art augmentation techniques across eight event logs. The results are also compared with those obtained by a baseline technique based on a stochastic transition system. The comparison has been carried on analyzing four different aspects: similarity, preservation of predictive information, information loss/enhancement, and computational times required. Results show that, considering the different criteria, a technique based on a stochastic transition system combined with resource queue modeling would provide higher quality synthetic event logs. Event-log augmentation techniques are also compared with traditional data-augmentation techniques, showing that the former provide significant benefits, whereas the latter fail to consider process constraints.

Paper Structure

This paper contains 26 sections, 12 equations, 2 figures, 10 tables.

Figures (2)

  • Figure 1: Overview of the Baseline technique. In a first Discovery phase, an input event log is used for generating a stochastic transition system with various perspectives. Time distributions as derived from the event log. In the Generation phase, the transition system is used for generating traces, while the inter-arrival distribution for sampling starting timestamp for each trace. These result in a synthetic event log.
  • Figure 2: Average values between the generated log of the 8 case studies reported in Section \ref{['subsec:use_cases']}, for different values of the parameter $k$. Different values of Trace Entropy are reported in the y-the axis, while different values of CFLD are reported in the x-axis.

Theorems & Definitions (12)

  • Definition 3.1: Events
  • Definition 3.2: Traces & Event Logs
  • Definition 3.3: Activity Duration
  • Definition 3.4: Handover of Work
  • Definition 5.1
  • Definition 5.2
  • Definition 5.3
  • Definition 5.4: Discretized Entropy
  • Definition 5.5: Trace Entropy
  • Definition 5.6: Prefix Entropy
  • ...and 2 more