Table of Contents
Fetching ...

Towards Reproducibility in Predictive Process Mining: SPICE - A Deep Learning Library

Oliver Stritzel, Nick Hühnerbein, Simon Rauch, Itzel Zarate, Lukas Fleischmann, Moike Buck, Attila Lischka, Christian Frey

TL;DR

This work tackles reproducibility challenges in Predictive Process Mining by proposing SPICE, an open-source PyTorch framework that reimplements three prominent deep-learning baselines (Tax2017, Camargo2019, ProcessTransformer) for tasks including Next Activity, Next Timestamp, Suffix, and Remaining Time. SPICE standardizes data splitting, preprocessing, and evaluation to enable fair, cross-dataset comparisons and supports multi-task and autoregressive predictions with configurable samplers and experiment tracking. The authors critique common experimental design flaws in PPM literature, demonstrate reimplementation details, and provide a centralized platform for ablation studies and future benchmarking. They show results across 11 datasets, highlighting reproducibility gains and remaining challenges in achieving faithful metric reproduction. The work aims to move PPM research toward trustworthy baselines and practical applicability through rigorous tooling and reproducible benchmarks.

Abstract

In recent years, Predictive Process Mining (PPM) techniques based on artificial neural networks have evolved as a method for monitoring the future behavior of unfolding business processes and predicting Key Performance Indicators (KPIs). However, many PPM approaches often lack reproducibility, transparency in decision making, usability for incorporating novel datasets and benchmarking, making comparisons among different implementations very difficult. In this paper, we propose SPICE, a Python framework that reimplements three popular, existing baseline deep-learning-based methods for PPM in PyTorch, while designing a common base framework with rigorous configurability to enable reproducible and robust comparison of past and future modelling approaches. We compare SPICE to original reported metrics and with fair metrics on 11 datasets.

Towards Reproducibility in Predictive Process Mining: SPICE - A Deep Learning Library

TL;DR

This work tackles reproducibility challenges in Predictive Process Mining by proposing SPICE, an open-source PyTorch framework that reimplements three prominent deep-learning baselines (Tax2017, Camargo2019, ProcessTransformer) for tasks including Next Activity, Next Timestamp, Suffix, and Remaining Time. SPICE standardizes data splitting, preprocessing, and evaluation to enable fair, cross-dataset comparisons and supports multi-task and autoregressive predictions with configurable samplers and experiment tracking. The authors critique common experimental design flaws in PPM literature, demonstrate reimplementation details, and provide a centralized platform for ablation studies and future benchmarking. They show results across 11 datasets, highlighting reproducibility gains and remaining challenges in achieving faithful metric reproduction. The work aims to move PPM research toward trustworthy baselines and practical applicability through rigorous tooling and reproducible benchmarks.

Abstract

In recent years, Predictive Process Mining (PPM) techniques based on artificial neural networks have evolved as a method for monitoring the future behavior of unfolding business processes and predicting Key Performance Indicators (KPIs). However, many PPM approaches often lack reproducibility, transparency in decision making, usability for incorporating novel datasets and benchmarking, making comparisons among different implementations very difficult. In this paper, we propose SPICE, a Python framework that reimplements three popular, existing baseline deep-learning-based methods for PPM in PyTorch, while designing a common base framework with rigorous configurability to enable reproducible and robust comparison of past and future modelling approaches. We compare SPICE to original reported metrics and with fair metrics on 11 datasets.

Paper Structure

This paper contains 15 sections, 12 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: SPICE Workflow
  • Figure 2: Total activity counts for the Helpdesk dataset. The raw dataset consists of 4580 cases.
  • Figure 3: Preprocessing visualized: The raw input trace gets START and END tokens appended. When creating input and output pairs, to ensure equal input and output sizes, pairs are padded with the respective padding tokens. A general preprocessing class exists that creates pairs displayed here, the encoding of tokens can be specific to one implementation. In next activity and next time prediction settings, only the first element of the suffix is chosen as the target, the preprocessing stays identical though. This allows us to train and evaluate multi-step prediction models by design. Inputs and outputs can also include time and resource features to train multi-task models.