Table of Contents
Fetching ...

Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

Adrian Rebmann, Fabian David Schmidt, Goran Glavaš, Han van der Aa

TL;DR

This work targets semantics-aware process mining by defining three tasks (T-SAD, A-SAD, S-NAP) that require understanding process semantics rather than raw log statistics. It introduces a large corpus of process behaviors derived from BPMN diagrams and corresponding benchmarking datasets, enabling rigorous evaluation of LLMs under in-context learning and supervised fine-tuning. Across extensive experiments, zero-shot LLMs perform poorly, but fine-tuned decoder LLMs (Llama, Mistral) consistently surpass an encoder baseline, with A-SAD being the easiest and S-NAP the hardest. The findings suggest a practical pathway for integrating LLM-based semantic reasoning into process mining pipelines, albeit with substantial training cost and task-specific data requirements, and they provide reproducible benchmarks for the community.

Abstract

The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit from an understanding of process behavior. Examples of such tasks include (semantic) anomaly detection and next activity prediction, which both involve considerations of the meaning of activities and their inter-relations. In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks. Furthermore, whereas most works on the intersection of LLMs and process mining only focus on testing these models out of the box, we provide a more principled investigation of the utility of LLMs for process mining, including their ability to obtain process mining knowledge post-hoc by means of in-context learning and supervised fine-tuning. Concretely, we define three process mining tasks that benefit from an understanding of process semantics and provide extensive benchmarking datasets for each of them. Our evaluation experiments reveal that (1) LLMs fail to solve challenging process mining tasks out of the box and when provided only a handful of in-context examples, (2) but they yield strong performance when fine-tuned for these tasks, consistently surpassing smaller, encoder-based language models.

Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks

TL;DR

This work targets semantics-aware process mining by defining three tasks (T-SAD, A-SAD, S-NAP) that require understanding process semantics rather than raw log statistics. It introduces a large corpus of process behaviors derived from BPMN diagrams and corresponding benchmarking datasets, enabling rigorous evaluation of LLMs under in-context learning and supervised fine-tuning. Across extensive experiments, zero-shot LLMs perform poorly, but fine-tuned decoder LLMs (Llama, Mistral) consistently surpass an encoder baseline, with A-SAD being the easiest and S-NAP the hardest. The findings suggest a practical pathway for integrating LLM-based semantic reasoning into process mining pipelines, albeit with substantial training cost and task-specific data requirements, and they provide reproducible benchmarks for the community.

Abstract

The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit from an understanding of process behavior. Examples of such tasks include (semantic) anomaly detection and next activity prediction, which both involve considerations of the meaning of activities and their inter-relations. In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks. Furthermore, whereas most works on the intersection of LLMs and process mining only focus on testing these models out of the box, we provide a more principled investigation of the utility of LLMs for process mining, including their ability to obtain process mining knowledge post-hoc by means of in-context learning and supervised fine-tuning. Concretely, we define three process mining tasks that benefit from an understanding of process semantics and provide extensive benchmarking datasets for each of them. Our evaluation experiments reveal that (1) LLMs fail to solve challenging process mining tasks out of the box and when provided only a handful of in-context examples, (2) but they yield strong performance when fine-tuned for these tasks, consistently surpassing smaller, encoder-based language models.
Paper Structure (16 sections, 3 figures, 5 tables)

This paper contains 16 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Illustration of discriminative classification with an encoder LM.
  • Figure 2: Illustration of constrained generative fine-tuning of a decoder LM.
  • Figure 3: One-shot in-context-learning prompt for the S-NAP task.