Table of Contents
Fetching ...

Evaluating LLM-Based Process Explanations under Progressive Behavioral-Input Reduction

P. van Oerle, R. H. Bemthuis, F. A. Bukhsh

TL;DR

This work investigates the data efficiency of LLM-generated process explanations by progressively reducing the behavioral input used for process discovery. A two-LLM pipeline generates explanations from partial models and evaluates them against the full-model reference, using synthetic job-shop logs and the Inductive Miner for model discovery. The study reveals a non-linear cost–quality frontier: meaningful explanations can be produced from significantly reduced inputs, with a practical mid-range sweet spot around $k=100$ to $k=1000$ and further gains at $k=100000$ in higher-stakes settings. The findings inform resource-constrained and real-time process intelligence by highlighting when larger abstractions are warranted and when data-efficient explanations suffice.

Abstract

Large Language Models (LLMs) are increasingly used to generate textual explanations of process models discovered from event logs. Producing explanations from large behavioral abstractions (e.g., directly-follows graphs or Petri nets) can be computationally expensive. This paper reports an exploratory evaluation of explanation quality under progressive behavioral-input reduction, where models are discovered from progressively smaller prefixes of a fixed log. Our pipeline (i) discovers models at multiple input sizes, (ii) prompts an LLM to generate explanations, and (iii) uses a second LLM to assess completeness, bottleneck identification, and suggested improvements. On synthetic logs, explanation quality is largely preserved under moderate reduction, indicating a practical cost-quality trade-off. The study is exploratory, as the scores are LLM-based (comparative signals rather than ground truth) and the data are synthetic. The results suggest a path toward more computationally efficient, LLM-assisted process analysis in resource-constrained settings.

Evaluating LLM-Based Process Explanations under Progressive Behavioral-Input Reduction

TL;DR

This work investigates the data efficiency of LLM-generated process explanations by progressively reducing the behavioral input used for process discovery. A two-LLM pipeline generates explanations from partial models and evaluates them against the full-model reference, using synthetic job-shop logs and the Inductive Miner for model discovery. The study reveals a non-linear cost–quality frontier: meaningful explanations can be produced from significantly reduced inputs, with a practical mid-range sweet spot around to and further gains at in higher-stakes settings. The findings inform resource-constrained and real-time process intelligence by highlighting when larger abstractions are warranted and when data-efficient explanations suffice.

Abstract

Large Language Models (LLMs) are increasingly used to generate textual explanations of process models discovered from event logs. Producing explanations from large behavioral abstractions (e.g., directly-follows graphs or Petri nets) can be computationally expensive. This paper reports an exploratory evaluation of explanation quality under progressive behavioral-input reduction, where models are discovered from progressively smaller prefixes of a fixed log. Our pipeline (i) discovers models at multiple input sizes, (ii) prompts an LLM to generate explanations, and (iii) uses a second LLM to assess completeness, bottleneck identification, and suggested improvements. On synthetic logs, explanation quality is largely preserved under moderate reduction, indicating a practical cost-quality trade-off. The study is exploratory, as the scores are LLM-based (comparative signals rather than ground truth) and the data are synthetic. The results suggest a path toward more computationally efficient, LLM-assisted process analysis in resource-constrained settings.

Paper Structure

This paper contains 18 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Research pipeline illustrating how process models (behavioral abstractions) are derived from sublogs containing only $k$ events. These models are explained by LLM$_1$ and scored by LLM$_2$ with the full model $M$ as a reference.
  • Figure 2: Average LLM-assigned explanation scores per number of input events ($k$). Each line corresponds to one experiment from Table \ref{['tab:tracesandlogs']}. Error bars represent score variability across five explanation evaluations per $k$.