Table of Contents
Fetching ...

Detecting Undesired Process Behavior by Means of Retrieval Augmented Generation

Michael Grohs, Adrian Rebmann, Jana-Rebecca Rehse

TL;DR

This work tackles the limitation of conformance checking requiring explicit process models by introducing a Retrieval Augmented Generation (RAG) approach that detects undesired process behavior without fine-tuning. An offline component populates a knowledge base of both desired and undesired traces from a collection of models $\mathcal{M}$, which is then augmented with log-context and used by an online LLM to identify deviations in each trace $t \in L$, covering five deviation patterns: inserted, skipped, repeated, replaced, and swapped. The method demonstrates superior accuracy and robustness over fine-tuned and vanilla baselines across synthetic and real-world logs, while reducing training requirements and enabling context-rich inference via retrieved examples. Real-life evaluation on the BPI Challenge 2019 confirms practical applicability, suggesting RAG as a viable, scalable alternative for detecting undesired behavior in complex processes where models are unavailable or costly to maintain. The work also discusses limitations, such as seed variability and longer inference times, and points to future avenues including more sophisticated trace embeddings and broader process perspectives beyond control-flow.

Abstract

Conformance checking techniques detect undesired process behavior by comparing process executions that are recorded in event logs to desired behavior that is captured in a dedicated process model. If such models are not available, conformance checking techniques are not applicable, but organizations might still be interested in detecting undesired behavior in their processes. To enable this, existing approaches use Large Language Models (LLMs), assuming that they can learn to distinguish desired from undesired behavior through fine-tuning. However, fine-tuning is highly resource-intensive and the fine-tuned LLMs often do not generalize well. To address these limitations, we propose an approach that requires neither a dedicated process model nor resource-intensive fine-tuning to detect undesired process behavior. Instead, we use Retrieval Augmented Generation (RAG) to provide an LLM with direct access to a knowledge base that contains both desired and undesired process behavior from other processes, assuming that the LLM can transfer this knowledge to the process at hand. Our evaluation shows that our approach outperforms fine-tuned LLMs in detecting undesired behavior, demonstrating that RAG is a viable alternative to resource-intensive fine-tuning, particularly when enriched with relevant context from the event log, such as frequent traces and activities.

Detecting Undesired Process Behavior by Means of Retrieval Augmented Generation

TL;DR

This work tackles the limitation of conformance checking requiring explicit process models by introducing a Retrieval Augmented Generation (RAG) approach that detects undesired process behavior without fine-tuning. An offline component populates a knowledge base of both desired and undesired traces from a collection of models , which is then augmented with log-context and used by an online LLM to identify deviations in each trace , covering five deviation patterns: inserted, skipped, repeated, replaced, and swapped. The method demonstrates superior accuracy and robustness over fine-tuned and vanilla baselines across synthetic and real-world logs, while reducing training requirements and enabling context-rich inference via retrieved examples. Real-life evaluation on the BPI Challenge 2019 confirms practical applicability, suggesting RAG as a viable, scalable alternative for detecting undesired behavior in complex processes where models are unavailable or costly to maintain. The work also discusses limitations, such as seed variability and longer inference times, and points to future avenues including more sophisticated trace embeddings and broader process perspectives beyond control-flow.

Abstract

Conformance checking techniques detect undesired process behavior by comparing process executions that are recorded in event logs to desired behavior that is captured in a dedicated process model. If such models are not available, conformance checking techniques are not applicable, but organizations might still be interested in detecting undesired behavior in their processes. To enable this, existing approaches use Large Language Models (LLMs), assuming that they can learn to distinguish desired from undesired behavior through fine-tuning. However, fine-tuning is highly resource-intensive and the fine-tuned LLMs often do not generalize well. To address these limitations, we propose an approach that requires neither a dedicated process model nor resource-intensive fine-tuning to detect undesired process behavior. Instead, we use Retrieval Augmented Generation (RAG) to provide an LLM with direct access to a knowledge base that contains both desired and undesired process behavior from other processes, assuming that the LLM can transfer this knowledge to the process at hand. Our evaluation shows that our approach outperforms fine-tuned LLMs in detecting undesired behavior, demonstrating that RAG is a viable alternative to resource-intensive fine-tuning, particularly when enriched with relevant context from the event log, such as frequent traces and activities.

Paper Structure

This paper contains 12 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of our approach for detecting undesired behavior
  • Figure 2: Excerpt from prompt template. Boldness indicates dynamic parts that are adjusted per trace.
  • Figure 3: Exemplary parsed output