Table of Contents
Fetching ...

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

Saptarshi Mandal, Seo Taek Kong, Dimitrios Katselis, R. Srikant

TL;DR

The paper addresses crowdsourcing with tasks of inherently distinct types (easy vs hard) by extending the Dawid-Skene framework to a two-type model. It introduces a spectral clustering method to partition tasks by type, achieving perfect clustering when the number of workers scales as $n = \Theta(\log d)$, enabling per-type application of DS-based label estimation (TE for reliabilities and NP-WMV for labels). The authors provide rigorous concentration and perturbation analyses, including a novel use of low-rank plus sparse structures and eigenvector perturbation results, to guarantee accurate clustering and fast-decaying labeling error. Empirical evaluations on real and pseudo-real datasets show that clustering by task type before label estimation improves performance in most scenarios, validating the practical value of the proposed two-step approach.

Abstract

The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. While weighted majority vote (WMV) with a single weight vector for each worker achieves the optimal label estimation error in the Dawid-Skene model, we show that different weights for different types are necessary for a multi-type model. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups that cluster tasks by type. Our analysis reveals that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Any algorithm designed for the Dawid-Skene model can then be applied independently to each type to infer the labels. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

TL;DR

The paper addresses crowdsourcing with tasks of inherently distinct types (easy vs hard) by extending the Dawid-Skene framework to a two-type model. It introduces a spectral clustering method to partition tasks by type, achieving perfect clustering when the number of workers scales as , enabling per-type application of DS-based label estimation (TE for reliabilities and NP-WMV for labels). The authors provide rigorous concentration and perturbation analyses, including a novel use of low-rank plus sparse structures and eigenvector perturbation results, to guarantee accurate clustering and fast-decaying labeling error. Empirical evaluations on real and pseudo-real datasets show that clustering by task type before label estimation improves performance in most scenarios, validating the practical value of the proposed two-step approach.

Abstract

The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. While weighted majority vote (WMV) with a single weight vector for each worker achieves the optimal label estimation error in the Dawid-Skene model, we show that different weights for different types are necessary for a multi-type model. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups that cluster tasks by type. Our analysis reveals that task types can be perfectly recovered if the number of workers scales logarithmically with the number of tasks . Any algorithm designed for the Dawid-Skene model can then be applied independently to each type to infer the labels. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.
Paper Structure (36 sections, 16 theorems, 155 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 36 sections, 16 theorems, 155 equations, 1 figure, 4 tables, 1 algorithm.

Key Result

Proposition 3.1

Suppose $X$ is drawn from the hard-easy model, and that the reliability vectors $r_e, r_h$ are known. For any weight vector $w = w(r_e, r_h)$, the probability of error on task $j$ of type $k \in \{e, h\}$ satisfies where the error exponent $\varphi_{n}(w,r_k)$ is given by

Figures (1)

  • Figure 1: Eigenspectrum of $T$ for different datasets: (a) Bluebird, (b) TREC, (c) Dog, (d) Duck, (e) RTE, and (f) Temp. For each plot, the y-axis represents the eigenvalues, and the x-axis represents the corresponding index of each eigenvalue.

Theorems & Definitions (16)

  • Proposition 3.1: upper-bound on expected labeling error: TA-WMV
  • Proposition 3.2: lower-bound on expected labeling error: TA-WMV
  • Proposition 3.4
  • Lemma 3.6
  • Lemma 3.7
  • Lemma 3.10
  • Lemma 3.11
  • Theorem 3.12
  • Theorem 3.13
  • Theorem 4.1: Imperfect Clustering
  • ...and 6 more