Table of Contents
Fetching ...

Integrating Domain Knowledge into Process Discovery Using Large Language Models

Ali Norouzifar, Humam Kourani, Marcus Dees, Wil van der Aalst

TL;DR

This paper tackles the problem of domain-knowledge–driven process discovery by integrating natural-language descriptions into the IMr framework through Large Language Models (LLMs). It introduces an interactive, three-way system composing LLMs, domain experts, and backend services to extract declarative constraints from text, validate them, and steer the IMr discovery process to produce domain-aligned process models. The authors implement a fully functional tool, evaluate multiple LLMs and prompting strategies, and demonstrate with real-world UWV data how expert feedback and rule updates improve discovery quality, while also analysing the trade-offs between recall and precision and the system’s handling of ambiguity. The work advances a human-in-the-loop paradigm in process mining, balancing data-driven learning with knowledge-based constraints to enhance model reliability for conformance checking and process improvement.

Abstract

Process discovery aims to derive process models from event logs, providing insights into operational behavior and forming a foundation for conformance checking and process improvement. However, models derived solely from event data may not accurately reflect the real process, as event logs are often incomplete or affected by noise, and domain knowledge, an important complementary resource, is typically disregarded. As a result, the discovered models may lack reliability for downstream tasks. We propose an interactive framework that incorporates domain knowledge, expressed in natural language, into the process discovery pipeline using Large Language Models (LLMs). Our approach leverages LLMs to extract declarative rules from textual descriptions provided by domain experts. These rules are used to guide the IMr discovery algorithm, which recursively constructs process models by combining insights from both the event log and the extracted rules, helping to avoid problematic process structures that contradict domain knowledge. The framework coordinates interactions among the LLM, domain experts, and a set of backend services. We present a fully implemented tool that supports this workflow and conduct an extensive evaluation of multiple LLMs and prompt engineering strategies. Our empirical study includes a case study based on a real-life event log with the involvement of domain experts, who assessed the usability and effectiveness of the framework.

Integrating Domain Knowledge into Process Discovery Using Large Language Models

TL;DR

This paper tackles the problem of domain-knowledge–driven process discovery by integrating natural-language descriptions into the IMr framework through Large Language Models (LLMs). It introduces an interactive, three-way system composing LLMs, domain experts, and backend services to extract declarative constraints from text, validate them, and steer the IMr discovery process to produce domain-aligned process models. The authors implement a fully functional tool, evaluate multiple LLMs and prompting strategies, and demonstrate with real-world UWV data how expert feedback and rule updates improve discovery quality, while also analysing the trade-offs between recall and precision and the system’s handling of ambiguity. The work advances a human-in-the-loop paradigm in process mining, balancing data-driven learning with knowledge-based constraints to enhance model reliability for conformance checking and process improvement.

Abstract

Process discovery aims to derive process models from event logs, providing insights into operational behavior and forming a foundation for conformance checking and process improvement. However, models derived solely from event data may not accurately reflect the real process, as event logs are often incomplete or affected by noise, and domain knowledge, an important complementary resource, is typically disregarded. As a result, the discovered models may lack reliability for downstream tasks. We propose an interactive framework that incorporates domain knowledge, expressed in natural language, into the process discovery pipeline using Large Language Models (LLMs). Our approach leverages LLMs to extract declarative rules from textual descriptions provided by domain experts. These rules are used to guide the IMr discovery algorithm, which recursively constructs process models by combining insights from both the event log and the extracted rules, helping to avoid problematic process structures that contradict domain knowledge. The framework coordinates interactions among the LLM, domain experts, and a set of backend services. We present a fully implemented tool that supports this workflow and conduct an extensive evaluation of multiple LLMs and prompt engineering strategies. Our empirical study includes a case study based on a real-life event log with the involvement of domain experts, who assessed the usability and effectiveness of the framework.

Paper Structure

This paper contains 34 sections, 8 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Venn diagram illustrating the relationship between the event log ($L$), the discovered model ($M$), and the actual process ($P$).
  • Figure 2: Overview of the interactive framework, showing how the LLM, domain expert, and backend services collaborate to support rule extraction and process discovery.
  • Figure 3: An overview of the IMr process discovery framework DBLP:conf/rcis/NorouzifarDA24.
  • Figure 4: Overview of the proposed framework for LLM-assisted process discovery.
  • Figure 5: Screenshot of the interactive tool supporting the proposed framework.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Definition 1: Event Log
  • Definition 2: Rules
  • Definition 3: Language of a Rule
  • Definition 4: Support and Confidence
  • Definition 5: Directly-Follows Graph
  • Definition 6: Binary Cut
  • Definition 7: Constraint Violation
  • Definition 8: Message Structure
  • Definition 9: LLM Task Interface