Table of Contents
Fetching ...

Efficient Online Crowdsourcing with Complex Annotations

Reshef Meir, Viet-An Nguyen, Xu Chen, Jagdish Ramakrishnan, Udi Weinsberg

TL;DR

The paper tackles efficient online crowdsourcing for complex annotations by leveraging a linear relation between average similarity and worker competence under the AK principle. It introduces the Online AK (OAK) framework and its partitioned extension POAK, along with a POAKi variant that uses Item Response Theory to reduce parameter count, and proves a Conditional Anna Karenina theorem for per-type accuracies. Empirically, it demonstrates improved cost-quality trade-offs on four real-world Meta datasets and shows calibration benefits when auditor labels are available. These results provide task-independent tools for online truth discovery that generalize beyond simple categorical labels, enabling scalable annotation across diverse domains.

Abstract

Crowdsourcing platforms use various truth discovery algorithms to aggregate annotations from multiple labelers. In an online setting, however, the main challenge is to decide whether to ask for more annotations for each item to efficiently trade off cost (i.e., the number of annotations) for quality of the aggregated annotations. In this paper, we propose a novel approach for general complex annotation (such as bounding boxes and taxonomy paths), that works in an online crowdsourcing setting. We prove that the expected average similarity of a labeler is linear in their accuracy \emph{conditional on the reported label}. This enables us to infer reported label accuracy in a broad range of scenarios. We conduct extensive evaluations on real-world crowdsourcing data from Meta and show the effectiveness of our proposed online algorithms in improving the cost-quality trade-off.

Efficient Online Crowdsourcing with Complex Annotations

TL;DR

The paper tackles efficient online crowdsourcing for complex annotations by leveraging a linear relation between average similarity and worker competence under the AK principle. It introduces the Online AK (OAK) framework and its partitioned extension POAK, along with a POAKi variant that uses Item Response Theory to reduce parameter count, and proves a Conditional Anna Karenina theorem for per-type accuracies. Empirically, it demonstrates improved cost-quality trade-offs on four real-world Meta datasets and shows calibration benefits when auditor labels are available. These results provide task-independent tools for online truth discovery that generalize beyond simple categorical labels, enabling scalable annotation across diverse domains.

Abstract

Crowdsourcing platforms use various truth discovery algorithms to aggregate annotations from multiple labelers. In an online setting, however, the main challenge is to decide whether to ask for more annotations for each item to efficiently trade off cost (i.e., the number of annotations) for quality of the aggregated annotations. In this paper, we propose a novel approach for general complex annotation (such as bounding boxes and taxonomy paths), that works in an online crowdsourcing setting. We prove that the expected average similarity of a labeler is linear in their accuracy \emph{conditional on the reported label}. This enables us to infer reported label accuracy in a broad range of scenarios. We conduct extensive evaluations on real-world crowdsourcing data from Meta and show the effectiveness of our proposed online algorithms in improving the cost-quality trade-off.
Paper Structure (55 sections, 6 theorems, 20 equations, 7 figures, 3 tables, 3 algorithms)

This paper contains 55 sections, 6 theorems, 20 equations, 7 figures, 3 tables, 3 algorithms.

Key Result

Theorem 4.1

Fix prior probabilities $q$, a category $\ell$, and a worker $i$ with partial type ${\mathcal{M}}^{(-\ell)}_i$. Then there are constants $\alpha^{(\ell)}, \beta^{(\ell)}$ such that $E[\pi_i^{(\ell)}]=\alpha^{(\ell)} c_i^{(\ell)} + \beta^{(\ell)}$.

Figures (7)

  • Figure 1: General online crowdsourcing process, in which the components with green solid frame are this work's focus.
  • Figure 2: Performance of the POAK algorithm on the Keypoints dataset, compared to a baseline that decides randomly on how many labels to use. Each point on the curve corresponds to the percentage of labels used and associated similarity at a given accuracy threshold. The star marker indicates that POAK achieves on-par accuracy with the baseline only using 81% of the labels. The shaded area (relative AUC) measures the improvement over the baseline.
  • Figure 3: RAUC results (relative to Uniform) of all methods on four datasets. Point estimates and 95% confidence intervals are obtained over 10 trails under each setting.
  • Figure 4: Comparison between different POAK variants.
  • Figure 5: A plot of estimated accuracy $\hat{c}^{(\ell)}_i$ vs. actual accuracy computed over all items in the test set. Each point represents a pair $(i,\ell)$ of worker and label type, where larger dots represent pairs with more samples in the data. The enlarged version of the left panel is in the Appendix.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem 4.1: Conditional Anna Karenina theorem for categorical data
  • Proposition 4.2
  • proof
  • Remark 4.3
  • Theorem B.1
  • Lemma B.2
  • proof
  • proof : Proof of Theorem \ref{['thm:AK_cond']}
  • Corollary B.3
  • Remark B.4
  • ...and 2 more