Table of Contents
Fetching ...

Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model

Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Ido Dagan

TL;DR

This paper reframes budget-constrained data annotation as a data triage problem between triage-to-human and triage-to-model, and proposes SANT, a selective annotation framework that combines an AL-based mechanism, an error-aware triage (EAT) module, and a bi-weighting fusion to allocate data for high-quality annotation. By jointly optimizing signals that prioritize informative data for experts and easy data for automated annotation, SANT achieves higher annotation quality than baselines across sentiment, knowledge-graph completion, and tagging tasks, especially for model-annotated data. The work demonstrates that incorporating triage-to-model data yields substantial gains and provides a blueprint for budget-aware, human-in-the-loop annotation systems, including practical deployment considerations and limitations. Overall, SANT establishes a landmark approach for budget-conscious data annotation and opens avenues for future triage-based annotation research with larger annotators and cost-aware strategies.

Abstract

To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data), while the rest of the data is indiscriminately assigned to model annotation (i.e., triage-to-model data). This may lead to inefficiencies in budget allocation for annotations, as easy data that the model could accurately annotate may be unnecessarily assigned to the expert, and hard data may be misclassified by the model. As a result, the overall annotation quality may be compromised. To address this issue, we propose a selective annotation framework called SANT. It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms. As such, informative or hard data is assigned to the expert for annotation, while easy data is handled by the model. Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers. We provide pioneering work on data annotation within budget constraints, establishing a landmark for future triage-based annotation studies.

Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model

TL;DR

This paper reframes budget-constrained data annotation as a data triage problem between triage-to-human and triage-to-model, and proposes SANT, a selective annotation framework that combines an AL-based mechanism, an error-aware triage (EAT) module, and a bi-weighting fusion to allocate data for high-quality annotation. By jointly optimizing signals that prioritize informative data for experts and easy data for automated annotation, SANT achieves higher annotation quality than baselines across sentiment, knowledge-graph completion, and tagging tasks, especially for model-annotated data. The work demonstrates that incorporating triage-to-model data yields substantial gains and provides a blueprint for budget-aware, human-in-the-loop annotation systems, including practical deployment considerations and limitations. Overall, SANT establishes a landmark approach for budget-conscious data annotation and opens avenues for future triage-based annotation research with larger annotators and cost-aware strategies.

Abstract

To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data), while the rest of the data is indiscriminately assigned to model annotation (i.e., triage-to-model data). This may lead to inefficiencies in budget allocation for annotations, as easy data that the model could accurately annotate may be unnecessarily assigned to the expert, and hard data may be misclassified by the model. As a result, the overall annotation quality may be compromised. To address this issue, we propose a selective annotation framework called SANT. It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms. As such, informative or hard data is assigned to the expert for annotation, while easy data is handled by the model. Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers. We provide pioneering work on data annotation within budget constraints, establishing a landmark for future triage-based annotation studies.
Paper Structure (28 sections, 4 equations, 4 figures, 7 tables)

This paper contains 28 sections, 4 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Semi-automatic annotation focuses on triage-to-human data, overlooking triage-to-model data.
  • Figure 2: Annotation quality of SANT and ChatGPT-based automatic annotation. The X-axis means the proportion of annotation budgets. As the annotation tasks become increasingly difficult (from task a to task c), human experts are indispensable to achieving high-quality annotation, despite the efficiency of adopting LLMs as annotators.
  • Figure 3: Model predictive ability evaluation on the same extra test dataset. While prioritizing triage-to-human data by AL has some advantages over triage-to-model data in promoting model predictive ability, it is not always the case (e.g., SANT w/o EAT loses its advantage in knowledge graph completion task).
  • Figure 4: Ablation study on $L_m$ and hyper-parameter tuning analysis on $\tau$. We simply choose $\tau=0.3$ in our experiments.