Table of Contents
Fetching ...

Leveraging GPT-4o Efficiency for Detecting Rework Anomaly in Business Processes

Mohammad Derakhshan, Paolo Ceravolo, Fatemeh Mohammadi

TL;DR

This paper investigates GPT-4o's effectiveness in detecting rework anomalies in business processes by converting event logs into structured variants and applying zero-shot, one-shot, and few-shot prompts across normal, uniform, and exponential anomaly distributions. Using a LangChain/LangGraph-based workflow, the study demonstrates high accuracy under certain distributions (e.g., up to 97.94% with few-shot prompting on uniform data) and shows that prompting strategy and anomaly distribution significantly affect performance. Relative to traditional ML baselines, GPT-4o offers strong detection capability with lower false discovery rates in favorable distributions, while token-size limitations and distribution skew can impair performance. The findings suggest GPT-4o as a practical, accessible tool for anomaly detection in BPM, with potential for hybrid integration and broader anomaly coverage through future research.

Abstract

This paper investigates the effectiveness of GPT-4o-2024-08-06, one of the Large Language Models (LLM) from OpenAI, in detecting business process anomalies, with a focus on rework anomalies. In our study, we developed a GPT-4o-based tool capable of transforming event logs into a structured format and identifying reworked activities within business event logs. The analysis was performed on a synthetic dataset designed to contain rework anomalies but free of loops. To evaluate the anomaly detection capabilities of GPT 4o-2024-08-06, we used three prompting techniques: zero-shot, one-shot, and few-shot. These techniques were tested on different anomaly distributions, namely normal, uniform, and exponential, to identify the most effective approach for each case. The results demonstrate the strong performance of GPT-4o-2024-08-06. On our dataset, the model achieved 96.14% accuracy with one-shot prompting for the normal distribution, 97.94% accuracy with few-shot prompting for the uniform distribution, and 74.21% accuracy with few-shot prompting for the exponential distribution. These results highlight the model's potential as a reliable tool for detecting rework anomalies in event logs and how anomaly distribution and prompting strategy influence the model's performance.

Leveraging GPT-4o Efficiency for Detecting Rework Anomaly in Business Processes

TL;DR

This paper investigates GPT-4o's effectiveness in detecting rework anomalies in business processes by converting event logs into structured variants and applying zero-shot, one-shot, and few-shot prompts across normal, uniform, and exponential anomaly distributions. Using a LangChain/LangGraph-based workflow, the study demonstrates high accuracy under certain distributions (e.g., up to 97.94% with few-shot prompting on uniform data) and shows that prompting strategy and anomaly distribution significantly affect performance. Relative to traditional ML baselines, GPT-4o offers strong detection capability with lower false discovery rates in favorable distributions, while token-size limitations and distribution skew can impair performance. The findings suggest GPT-4o as a practical, accessible tool for anomaly detection in BPM, with potential for hybrid integration and broader anomaly coverage through future research.

Abstract

This paper investigates the effectiveness of GPT-4o-2024-08-06, one of the Large Language Models (LLM) from OpenAI, in detecting business process anomalies, with a focus on rework anomalies. In our study, we developed a GPT-4o-based tool capable of transforming event logs into a structured format and identifying reworked activities within business event logs. The analysis was performed on a synthetic dataset designed to contain rework anomalies but free of loops. To evaluate the anomaly detection capabilities of GPT 4o-2024-08-06, we used three prompting techniques: zero-shot, one-shot, and few-shot. These techniques were tested on different anomaly distributions, namely normal, uniform, and exponential, to identify the most effective approach for each case. The results demonstrate the strong performance of GPT-4o-2024-08-06. On our dataset, the model achieved 96.14% accuracy with one-shot prompting for the normal distribution, 97.94% accuracy with few-shot prompting for the uniform distribution, and 74.21% accuracy with few-shot prompting for the exponential distribution. These results highlight the model's potential as a reliable tool for detecting rework anomalies in event logs and how anomaly distribution and prompting strategy influence the model's performance.

Paper Structure

This paper contains 15 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Anomaly distribution of datasets. From left to right: normal, uniform, and exponential distribution. The x-axis represents the variant index in the dataset, and the y-axis illustrates the anomaly frequency.
  • Figure 2: Anomaly detection workflow
  • Figure 3: Anomaly detection state graph