Table of Contents
Fetching ...

Human-LLM Hybrid Text Answer Aggregation for Crowd Annotations

Jiyi Li

TL;DR

A human-LLM hybrid text answer aggregation method with a Creator-Aggregator Multi-Stage (CAMS) crowdsourcing framework is proposed and the results show the effectiveness of the approach based on the collaboration of crowd workers and LLMs.

Abstract

The quality is a crucial issue for crowd annotations. Answer aggregation is an important type of solution. The aggregated answers estimated from multiple crowd answers to the same instance are the eventually collected annotations, rather than the individual crowd answers themselves. Recently, the capability of Large Language Models (LLMs) on data annotation tasks has attracted interest from researchers. Most of the existing studies mainly focus on the average performance of individual crowd workers; several recent works studied the scenarios of aggregation on categorical labels and LLMs used as label creators. However, the scenario of aggregation on text answers and the role of LLMs as aggregators are not yet well-studied. In this paper, we investigate the capability of LLMs as aggregators in the scenario of close-ended crowd text answer aggregation. We propose a human-LLM hybrid text answer aggregation method with a Creator-Aggregator Multi-Stage (CAMS) crowdsourcing framework. We make the experiments based on public crowdsourcing datasets. The results show the effectiveness of our approach based on the collaboration of crowd workers and LLMs.

Human-LLM Hybrid Text Answer Aggregation for Crowd Annotations

TL;DR

A human-LLM hybrid text answer aggregation method with a Creator-Aggregator Multi-Stage (CAMS) crowdsourcing framework is proposed and the results show the effectiveness of the approach based on the collaboration of crowd workers and LLMs.

Abstract

The quality is a crucial issue for crowd annotations. Answer aggregation is an important type of solution. The aggregated answers estimated from multiple crowd answers to the same instance are the eventually collected annotations, rather than the individual crowd answers themselves. Recently, the capability of Large Language Models (LLMs) on data annotation tasks has attracted interest from researchers. Most of the existing studies mainly focus on the average performance of individual crowd workers; several recent works studied the scenarios of aggregation on categorical labels and LLMs used as label creators. However, the scenario of aggregation on text answers and the role of LLMs as aggregators are not yet well-studied. In this paper, we investigate the capability of LLMs as aggregators in the scenario of close-ended crowd text answer aggregation. We propose a human-LLM hybrid text answer aggregation method with a Creator-Aggregator Multi-Stage (CAMS) crowdsourcing framework. We make the experiments based on public crowdsourcing datasets. The results show the effectiveness of our approach based on the collaboration of crowd workers and LLMs.

Paper Structure

This paper contains 19 sections, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Existing Single-Stage framework for crowdsourced text answer aggregation.
  • Figure 2: Our Creator-Aggregator Multi-Stage (CAMS) framework for crowdsourced text answer aggregation.
  • Figure 3: GLEU Results by different number of L.A. (GPT-4 (O)) based on SMS and RASA.
  • Figure 4: METEOR Results by different number of L.A. (GPT-4 (O)) based on SMS and RASA.
  • Figure 5: Embedding Similarity Results by different number of L.A. (GPT-4 (O)) based on SMS and RASA.
  • ...and 3 more figures