Table of Contents
Fetching ...

SNAP: Semantic Stories for Next Activity Prediction

Alon Oved, Segev Shlomov, Sergey Zeltyn, Nir Mashkif, Avi Yaeli

TL;DR

SNAP addresses the underutilization of semantic information in PBPM event logs for next activity prediction by converting traces into semantic stories via feature selection and LLM-generated templates, then fine-tuning language foundation models (e.g., BERT, DeBERTa, GPT-3) for classification. Across six public datasets, SNAP substantially outperforms nine state-of-the-art baselines, with the largest gains on semantically rich data such as conversational RPA logs. Ablation studies show the value of coherent semantic storytelling, the impact of meaningful activity names, and the contribution of user utterances to predictive accuracy. The work introduces a novel paradigm that leverages textual semantics for PBPM and points to extensions to other predictive tasks like outcome or remaining-time prediction, with practical implications for BPM and AI-enabled automation.

Abstract

Predicting the next activity in an ongoing process is one of the most common classification tasks in the business process management (BPM) domain. It allows businesses to optimize resource allocation, enhance operational efficiency, and aids in risk mitigation and strategic decision-making. This provides a competitive edge in the rapidly evolving confluence of BPM and AI. Existing state-of-the-art AI models for business process prediction do not fully capitalize on available semantic information within process event logs. As current advanced AI-BPM systems provide semantically-richer textual data, the need for novel adequate models grows. To address this gap, we propose the novel SNAP method that leverages language foundation models by constructing semantic contextual stories from the process historical event logs and using them for the next activity prediction. We compared the SNAP algorithm with nine state-of-the-art models on six benchmark datasets and show that SNAP significantly outperforms them, especially for datasets with high levels of semantic content.

SNAP: Semantic Stories for Next Activity Prediction

TL;DR

SNAP addresses the underutilization of semantic information in PBPM event logs for next activity prediction by converting traces into semantic stories via feature selection and LLM-generated templates, then fine-tuning language foundation models (e.g., BERT, DeBERTa, GPT-3) for classification. Across six public datasets, SNAP substantially outperforms nine state-of-the-art baselines, with the largest gains on semantically rich data such as conversational RPA logs. Ablation studies show the value of coherent semantic storytelling, the impact of meaningful activity names, and the contribution of user utterances to predictive accuracy. The work introduces a novel paradigm that leverages textual semantics for PBPM and points to extensions to other predictive tasks like outcome or remaining-time prediction, with practical implications for BPM and AI-enabled automation.

Abstract

Predicting the next activity in an ongoing process is one of the most common classification tasks in the business process management (BPM) domain. It allows businesses to optimize resource allocation, enhance operational efficiency, and aids in risk mitigation and strategic decision-making. This provides a competitive edge in the rapidly evolving confluence of BPM and AI. Existing state-of-the-art AI models for business process prediction do not fully capitalize on available semantic information within process event logs. As current advanced AI-BPM systems provide semantically-richer textual data, the need for novel adequate models grows. To address this gap, we propose the novel SNAP method that leverages language foundation models by constructing semantic contextual stories from the process historical event logs and using them for the next activity prediction. We compared the SNAP algorithm with nine state-of-the-art models on six benchmark datasets and show that SNAP significantly outperforms them, especially for datasets with high levels of semantic content.
Paper Structure (15 sections, 2 figures, 6 tables, 1 algorithm)

This paper contains 15 sections, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: The LLM prompt used to create a story narrative. It includes a story output example and its corresponding input features.
  • Figure 2: A generated story for a loan application example. The prompt in Figure \ref{['fig:prompt-example']} was used as an input to an LLM (Llama-2)