Supporting Annotators with Affordances for Efficiently Labeling Conversational Data

Austin Z. Henley; David Piorkowski

Supporting Annotators with Affordances for Efficiently Labeling Conversational Data

Austin Z. Henley, David Piorkowski

TL;DR

This work addresses the bottleneck of creating ground-truth labels for machine learning by introducing CAL, a production-quality, affordance-rich annotation interface for conversational data. CAL integrates code-set documentation, prevents invalid labels, guides label selection via a wizard, and provides quick access to previous labels, all within a single tool. In a within-subjects study against a standard spreadsheet, CAL significantly reduced cognitive load ($p < 0.05$) and achieved higher usability ($p < 0.001$), with most participants preferring CAL and no one preferring the spreadsheet. The findings demonstrate that integrated affordances can markedly improve annotator experience and efficiency, with future work aimed at automated label suggestions and fatigue monitoring to further enhance reliability and throughput.

Abstract

Without well-labeled ground truth data, machine learning-based systems would not be as ubiquitous as they are today, but these systems rely on substantial amounts of correctly labeled data. Unfortunately, crowdsourced labeling is time consuming and expensive. To address the concerns of effort and tedium, we designed CAL, a novel interface to aid in data labeling. We made several key design decisions for CAL, which include preventing inapt labels from being selected, guiding users in selecting an appropriate label when they need assistance, incorporating labeling documentation into the interface, and providing an efficient means to view previous labels. We implemented a production-quality implementation of CAL and report a user-study evaluation that compares CAL to a standard spreadsheet. Key findings of our study include users using CAL reported lower cognitive load, did not increase task time, users rated CAL to be easier to use, and users preferred CAL over the spreadsheet.

Supporting Annotators with Affordances for Efficiently Labeling Conversational Data

TL;DR

) and achieved higher usability (

), with most participants preferring CAL and no one preferring the spreadsheet. The findings demonstrate that integrated affordances can markedly improve annotator experience and efficiency, with future work aimed at automated label suggestions and fatigue monitoring to further enhance reliability and throughput.

Abstract

Paper Structure (30 sections, 9 figures, 1 table)

This paper contains 30 sections, 9 figures, 1 table.

Introduction
Background & Related Work
Data Labeling
Studies on Data Labeling
Tools to Support Labeling
Tool Design: CAL
Basic Features
Design Rationale and Novel Features
Integrate labeling code set
Prevent inapt labels
Assist in choosing the best label
Efficient viewing of previous labels
Implementation Details
Evaluation Method
Participants
...and 15 more sections

Figures (9)

Figure 1: An example spreadsheet used for labeling conversational data. It includes the (a) transcript with the human utterance in the first column and the chatbot's response in the second column. The annotator enters y or n into the (b) three columns representing different categories in this example code set.
Figure 2: (a) The CAL web application's data labeling view, including the (b) conversation transcript. To label the utterances, a user selects an individual utterance and uses the (c) labeling interface to (d) select the applicable labels. The user can track their progress with the (e) progress bar and navigate to the next/previous conversation.
Figure 3: An example of labeling documentation that helps labelers in choosing the correct label by answering a series of yes/no questions. The wizard feature presents these questions to the user one question at a time. After answering all of the questions, CAL will select the label for the user and notify the user that the label was selected.
Figure 4: An example of using the Wizard feature to aid an annotator in selecting the most appropriate label. After clicking the "?" button to initiate the wizard, the annotator is asked a series of binary questions. Depending on the answers provided to these questions, a label is automatically selected on behalf of the annotator. The annotator can revisit the wizard or modify the selected label if needed.
Figure 5: The inter-rater reliability view that displays the amount of agreement between data labelers. In this case, the two labelers have 37.5% to 50% agreement on three different categories using Jaccard's Index.
...and 4 more figures

Supporting Annotators with Affordances for Efficiently Labeling Conversational Data

TL;DR

Abstract

Supporting Annotators with Affordances for Efficiently Labeling Conversational Data

Authors

TL;DR

Abstract

Table of Contents

Figures (9)