Table of Contents
Fetching ...

Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation

Yushi Sun, Jiachuan Wang, Peng Cheng, Libin Zheng, Lei Chen, Jian Yin

TL;DR

This approach proposes two estimation modules to both statistically analyze the cross-domain correlation and simulate the learning gain of workers dynamically, and shows that the method outperforms the baselines on both real-world and synthetic datasets.

Abstract

Annotation through crowdsourcing draws incremental attention, which relies on an effective selection scheme given a pool of workers. Existing methods propose to select workers based on their performance on tasks with ground truth, while two important points are missed. 1) The historical performances of workers in other tasks. In real-world scenarios, workers need to solve a new task whose correlation with previous tasks is not well-known before the training, which is called cross-domain. 2) The dynamic worker performance as workers will learn from the ground truth. In this paper, we consider both factors in designing an allocation scheme named cross-domain-aware worker selection with training approach. Our approach proposes two estimation modules to both statistically analyze the cross-domain correlation and simulate the learning gain of workers dynamically. A framework with a theoretical analysis of the worker elimination process is given. To validate the effectiveness of our methods, we collect two novel real-world datasets and generate synthetic datasets. The experiment results show that our method outperforms the baselines on both real-world and synthetic datasets.

Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation

TL;DR

This approach proposes two estimation modules to both statistically analyze the cross-domain correlation and simulate the learning gain of workers dynamically, and shows that the method outperforms the baselines on both real-world and synthetic datasets.

Abstract

Annotation through crowdsourcing draws incremental attention, which relies on an effective selection scheme given a pool of workers. Existing methods propose to select workers based on their performance on tasks with ground truth, while two important points are missed. 1) The historical performances of workers in other tasks. In real-world scenarios, workers need to solve a new task whose correlation with previous tasks is not well-known before the training, which is called cross-domain. 2) The dynamic worker performance as workers will learn from the ground truth. In this paper, we consider both factors in designing an allocation scheme named cross-domain-aware worker selection with training approach. Our approach proposes two estimation modules to both statistically analyze the cross-domain correlation and simulate the learning gain of workers dynamically. A framework with a theoretical analysis of the worker elimination process is given. To validate the effectiveness of our methods, we collect two novel real-world datasets and generate synthetic datasets. The experiment results show that our method outperforms the baselines on both real-world and synthetic datasets.
Paper Structure (24 sections, 11 equations, 7 figures, 5 tables, 4 algorithms)

This paper contains 24 sections, 11 equations, 7 figures, 5 tables, 4 algorithms.

Figures (7)

  • Figure 1: Cross-domain worker selection. The left shows the two prior domains: plane and elephant. The right shows the target domain: flower. We record workers' historical accuracy on the two prior domains and estimate the accuracy on the target domain, to effectively train and select desired workers.
  • Figure 2: The definition of cross-domain-aware worker selection with training problem. The worker selection algorithm assigns learning tasks to workers, records and analyzes the learning task results, and performs worker selection. The performance of the selected workers is evaluated based on the target domain working tasks.
  • Figure 3: The general pipeline of our cross-domain-aware worker selection with training algorithm.
  • Figure 4: An illustration of the learning task (left) and its corresponding ground truth answer (right). The learning tasks will be displayed to the workers. After they complete their current round answers, the ground truth will be revealed for them to learn.
  • Figure 5: The sensitivity analysis regarding the initialized annotation accuracy of the target domain: $a_T = \frac{1}{1+e^{\beta_T}}$ on different datasets.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6