Table of Contents
Fetching ...

Assisted Data Annotation for Business Process Information Extraction from Textual Documents

Julian Neuberger, Han van der Aa, Lars Ackermann, Daniel Buschek, Jannic Herrmann, Stefan Jablonski

TL;DR

The paper tackles data scarcity in business process information extraction by introducing two annotation assistance features: AI-based recommendations and graphical BPMN visualization, implemented in a prototype tool. In a controlled study with 31 participants, recommendations reduced workload by up to $-51.0\%$ and improved F1 accuracy by up to $+0.224$ ($+38.9\%$), with novices benefiting most and nearly matching expert performance under assistance. The authors publicly release data and code to spur further research, while acknowledging limitations in visualization quality and potential error propagation. Future work includes expanding assistance features, analyzing interaction data, and conducting broader evaluations across varied user groups.

Abstract

Machine-learning based generation of process models from natural language text process descriptions provides a solution for the time-intensive and expensive process discovery phase. Many organizations have to carry out this phase, before they can utilize business process management and its benefits. Yet, research towards this is severely restrained by an apparent lack of large and high-quality datasets. This lack of data can be attributed to, among other things, an absence of proper tool assistance for dataset creation, resulting in high workloads and inferior data quality. We explore two assistance features to support dataset creation, a recommendation system for identifying process information in the text and visualization of the current state of already identified process information as a graphical business process model. A controlled user study with 31 participants shows that assisting dataset creators with recommendations lowers all aspects of workload, up to $-51.0\%$, and significantly improves annotation quality, up to $+38.9\%$. We make all data and code available to encourage further research on additional novel assistance strategies.

Assisted Data Annotation for Business Process Information Extraction from Textual Documents

TL;DR

The paper tackles data scarcity in business process information extraction by introducing two annotation assistance features: AI-based recommendations and graphical BPMN visualization, implemented in a prototype tool. In a controlled study with 31 participants, recommendations reduced workload by up to and improved F1 accuracy by up to (), with novices benefiting most and nearly matching expert performance under assistance. The authors publicly release data and code to spur further research, while acknowledging limitations in visualization quality and potential error propagation. Future work includes expanding assistance features, analyzing interaction data, and conducting broader evaluations across varied user groups.

Abstract

Machine-learning based generation of process models from natural language text process descriptions provides a solution for the time-intensive and expensive process discovery phase. Many organizations have to carry out this phase, before they can utilize business process management and its benefits. Yet, research towards this is severely restrained by an apparent lack of large and high-quality datasets. This lack of data can be attributed to, among other things, an absence of proper tool assistance for dataset creation, resulting in high workloads and inferior data quality. We explore two assistance features to support dataset creation, a recommendation system for identifying process information in the text and visualization of the current state of already identified process information as a graphical business process model. A controlled user study with 31 participants shows that assisting dataset creators with recommendations lowers all aspects of workload, up to , and significantly improves annotation quality, up to . We make all data and code available to encourage further research on additional novel assistance strategies.
Paper Structure (14 sections, 5 figures, 4 tables)

This paper contains 14 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The first two sentences of document doc-1.2 in the PET dataset, fully annotated with entity mentions, entity references, and relations.
  • Figure 2: Visualization of the workflow and general architecture of our implementation.
  • Figure 3: Assigning annotators to a sequence of scenarios based on a balanced Latin square (left), and demographic information about user study participants (right).
  • Figure 4: Subjective measures for each of the four scenarios from Sect. \ref{['sec:study-design']}.
  • Figure 5: Objective measures for each of the four scenarios from Sect. \ref{['sec:study-design']}.