Assisted Data Annotation for Business Process Information Extraction from Textual Documents

Julian Neuberger; Han van der Aa; Lars Ackermann; Daniel Buschek; Jannic Herrmann; Stefan Jablonski

Assisted Data Annotation for Business Process Information Extraction from Textual Documents

Julian Neuberger, Han van der Aa, Lars Ackermann, Daniel Buschek, Jannic Herrmann, Stefan Jablonski

TL;DR

The paper tackles data scarcity in business process information extraction by introducing two annotation assistance features: AI-based recommendations and graphical BPMN visualization, implemented in a prototype tool. In a controlled study with 31 participants, recommendations reduced workload by up to $-51.0\%$ and improved F1 accuracy by up to $+0.224$ ($+38.9\%$), with novices benefiting most and nearly matching expert performance under assistance. The authors publicly release data and code to spur further research, while acknowledging limitations in visualization quality and potential error propagation. Future work includes expanding assistance features, analyzing interaction data, and conducting broader evaluations across varied user groups.

Abstract

Machine-learning based generation of process models from natural language text process descriptions provides a solution for the time-intensive and expensive process discovery phase. Many organizations have to carry out this phase, before they can utilize business process management and its benefits. Yet, research towards this is severely restrained by an apparent lack of large and high-quality datasets. This lack of data can be attributed to, among other things, an absence of proper tool assistance for dataset creation, resulting in high workloads and inferior data quality. We explore two assistance features to support dataset creation, a recommendation system for identifying process information in the text and visualization of the current state of already identified process information as a graphical business process model. A controlled user study with 31 participants shows that assisting dataset creators with recommendations lowers all aspects of workload, up to $-51.0\%$, and significantly improves annotation quality, up to $+38.9\%$. We make all data and code available to encourage further research on additional novel assistance strategies.

Assisted Data Annotation for Business Process Information Extraction from Textual Documents

TL;DR

and improved F1 accuracy by up to

(

), with novices benefiting most and nearly matching expert performance under assistance. The authors publicly release data and code to spur further research, while acknowledging limitations in visualization quality and potential error propagation. Future work includes expanding assistance features, analyzing interaction data, and conducting broader evaluations across varied user groups.

Abstract

, and significantly improves annotation quality, up to

. We make all data and code available to encourage further research on additional novel assistance strategies.

Paper Structure (14 sections, 5 figures, 4 tables)

This paper contains 14 sections, 5 figures, 4 tables.

Introduction
Related Work
Concept for Assisted Annotation
The Process Information Extraction Task
Annotation Workflow
Assistance Features
Implementation
Study Design
Results
Subjective Measures
Objective Measures
Effects of Annotator Experience
Conclusion
Disclosure of Interests.

Figures (5)

Figure 1: The first two sentences of document doc-1.2 in the PET dataset, fully annotated with entity mentions, entity references, and relations.
Figure 2: Visualization of the workflow and general architecture of our implementation.
Figure 3: Assigning annotators to a sequence of scenarios based on a balanced Latin square (left), and demographic information about user study participants (right).
Figure 4: Subjective measures for each of the four scenarios from Sect. \ref{['sec:study-design']}.
Figure 5: Objective measures for each of the four scenarios from Sect. \ref{['sec:study-design']}.

Assisted Data Annotation for Business Process Information Extraction from Textual Documents

TL;DR

Abstract

Assisted Data Annotation for Business Process Information Extraction from Textual Documents

Authors

TL;DR

Abstract

Table of Contents

Figures (5)