Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

Saptarshi Mandal; Seo Taek Kong; Dimitrios Katselis; R. Srikant

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

Saptarshi Mandal, Seo Taek Kong, Dimitrios Katselis, R. Srikant

TL;DR

The paper addresses crowdsourcing with tasks of inherently distinct types (easy vs hard) by extending the Dawid-Skene framework to a two-type model. It introduces a spectral clustering method to partition tasks by type, achieving perfect clustering when the number of workers scales as $n = \Theta(\log d)$, enabling per-type application of DS-based label estimation (TE for reliabilities and NP-WMV for labels). The authors provide rigorous concentration and perturbation analyses, including a novel use of low-rank plus sparse structures and eigenvector perturbation results, to guarantee accurate clustering and fast-decaying labeling error. Empirical evaluations on real and pseudo-real datasets show that clustering by task type before label estimation improves performance in most scenarios, validating the practical value of the proposed two-step approach.

Abstract

The Dawid-Skene model is the most widely assumed model in the analysis of crowdsourcing algorithms that estimate ground-truth labels from noisy worker responses. In this work, we are motivated by crowdsourcing applications where workers have distinct skill sets and their accuracy additionally depends on a task's type. While weighted majority vote (WMV) with a single weight vector for each worker achieves the optimal label estimation error in the Dawid-Skene model, we show that different weights for different types are necessary for a multi-type model. Focusing on the case where there are two types of tasks, we propose a spectral method to partition tasks into two groups that cluster tasks by type. Our analysis reveals that task types can be perfectly recovered if the number of workers $n$ scales logarithmically with the number of tasks $d$. Any algorithm designed for the Dawid-Skene model can then be applied independently to each type to infer the labels. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

TL;DR

, enabling per-type application of DS-based label estimation (TE for reliabilities and NP-WMV for labels). The authors provide rigorous concentration and perturbation analyses, including a novel use of low-rank plus sparse structures and eigenvector perturbation results, to guarantee accurate clustering and fast-decaying labeling error. Empirical evaluations on real and pseudo-real datasets show that clustering by task type before label estimation improves performance in most scenarios, validating the practical value of the proposed two-step approach.

Abstract

scales logarithmically with the number of tasks

. Any algorithm designed for the Dawid-Skene model can then be applied independently to each type to infer the labels. Numerical experiments show how clustering tasks by type before estimating ground-truth labels enhances the performance of crowdsourcing algorithms in practical applications.

Paper Structure (36 sections, 16 theorems, 155 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 36 sections, 16 theorems, 155 equations, 1 figure, 4 tables, 1 algorithm.

Introduction
Background
Problem Setting
Related Work: Dawid-Skene Model
Related Work: Task-Specific Reliability Models
Main Results
Limitations of Type-Agnostic Weighted Majority Vote
Spectral Clustering
Label Estimation for Hard-Easy Tasks
Discussion
Proof of Theorem \ref{['thm:cluster_crowdsourcing']}: Perfect Clustering
Concentration of the Noise Matrix N
Concentration of the Principal Eigenvector
Sufficient Condition for Perfect Clustering
Proof of Theorem \ref{['thm:cluster_crowdsourcing']}: Perfect Clustering
...and 21 more sections

Key Result

Proposition 3.1

Suppose $X$ is drawn from the hard-easy model, and that the reliability vectors $r_e, r_h$ are known. For any weight vector $w = w(r_e, r_h)$, the probability of error on task $j$ of type $k \in \{e, h\}$ satisfies where the error exponent $\varphi_{n}(w,r_k)$ is given by

Figures (1)

Figure 1: Eigenspectrum of $T$ for different datasets: (a) Bluebird, (b) TREC, (c) Dog, (d) Duck, (e) RTE, and (f) Temp. For each plot, the y-axis represents the eigenvalues, and the x-axis represents the corresponding index of each eigenvalue.

Theorems & Definitions (16)

Proposition 3.1: upper-bound on expected labeling error: TA-WMV
Proposition 3.2: lower-bound on expected labeling error: TA-WMV
Proposition 3.4
Lemma 3.6
Lemma 3.7
Lemma 3.10
Lemma 3.11
Theorem 3.12
Theorem 3.13
Theorem 4.1: Imperfect Clustering
...and 6 more

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

TL;DR

Abstract

Spectral Clustering for Crowdsourcing with Inherently Distinct Task Types

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (16)