Table of Contents
Fetching ...

The Majority Vote Paradigm Shift: When Popular Meets Optimal

Antonio Purificato, Maria Sofia Bucarelli, Anil Kumar Nelakanti, Andrea Bacciu, Fabrizio Silvestri, Amin Mantrach

TL;DR

This work analyzes when the simple Majority Vote (MV) label-aggregation rule can be theoretically optimal relative to the oracle MAP (oMAP) in crowdsourced binary labeling under annotator noise. By modeling annotator behavior with class-conditional noise matrices, the authors derive necessary and sufficient conditions for MV to match oMAP in symmetric (one-coin) and asymmetric (two-coin) settings, and they provide verifiable, high-probability certificates that rely only on estimated parameters. The study further extends the results to scenarios with perturbations in annotator reliability and multiple annotator groups, and it validates the theory with synthetic and real-data experiments, showing MV’s practical viability and speed. Taken together, the results offer principled guidance for when MV suffices and how to certify its optimality in practice, reducing the need for expensive expert labeling and complex aggregation schemes.

Abstract

Reliably labelling data typically requires annotations from multiple human workers. However, humans are far from being perfect. Hence, it is a common practice to aggregate labels gathered from multiple annotators to make a more confident estimate of the true label. Among many aggregation methods, the simple and well known Majority Vote (MV) selects the class label polling the highest number of votes. However, despite its importance, the optimality of MV's label aggregation has not been extensively studied. We address this gap in our work by characterising the conditions under which MV achieves the theoretically optimal lower bound on label estimation error. Our results capture the tolerable limits on annotation noise under which MV can optimally recover labels for a given class distribution. This certificate of optimality provides a more principled approach to model selection for label aggregation as an alternative to otherwise inefficient practices that sometimes include higher experts, gold labels, etc., that are all marred by the same human uncertainty despite huge time and monetary costs. Experiments on both synthetic and real world data corroborate our theoretical findings.

The Majority Vote Paradigm Shift: When Popular Meets Optimal

TL;DR

This work analyzes when the simple Majority Vote (MV) label-aggregation rule can be theoretically optimal relative to the oracle MAP (oMAP) in crowdsourced binary labeling under annotator noise. By modeling annotator behavior with class-conditional noise matrices, the authors derive necessary and sufficient conditions for MV to match oMAP in symmetric (one-coin) and asymmetric (two-coin) settings, and they provide verifiable, high-probability certificates that rely only on estimated parameters. The study further extends the results to scenarios with perturbations in annotator reliability and multiple annotator groups, and it validates the theory with synthetic and real-data experiments, showing MV’s practical viability and speed. Taken together, the results offer principled guidance for when MV suffices and how to certify its optimality in practice, reducing the need for expensive expert labeling and complex aggregation schemes.

Abstract

Reliably labelling data typically requires annotations from multiple human workers. However, humans are far from being perfect. Hence, it is a common practice to aggregate labels gathered from multiple annotators to make a more confident estimate of the true label. Among many aggregation methods, the simple and well known Majority Vote (MV) selects the class label polling the highest number of votes. However, despite its importance, the optimality of MV's label aggregation has not been extensively studied. We address this gap in our work by characterising the conditions under which MV achieves the theoretically optimal lower bound on label estimation error. Our results capture the tolerable limits on annotation noise under which MV can optimally recover labels for a given class distribution. This certificate of optimality provides a more principled approach to model selection for label aggregation as an alternative to otherwise inefficient practices that sometimes include higher experts, gold labels, etc., that are all marred by the same human uncertainty despite huge time and monetary costs. Experiments on both synthetic and real world data corroborate our theoretical findings.

Paper Structure

This paper contains 25 sections, 21 theorems, 143 equations, 7 figures, 7 tables.

Key Result

Proposition 3.1

The oracle MAP estimator minimizes the expected ${0\mhyphen1}$ loss $\mathbb{E}[\mathcal{L}_{{0\mhyphen1}}]$.

Figures (7)

  • Figure 1: Flow-chart summarizing conditions for optimality of MV with its label estimates matching that of oMAP.
  • Figure 2: Illustration of Theorem \ref{['thm:twocoins']} on simulations. We analyze the optimality of MV compared to oMAP verifying if the condition in Equation \ref{['request_ineq']} is satisfied for ten different $\nu_0$ values as $T_{11}$ and $T_{00}$ vary between $0.5$ and $1$. Blue points denote where MV is equal to oMAP.
  • Figure 3: Illustrations on simulated data of Theorem \ref{['thm:binary_classes_shared_coins']} comparing MV to oMAP are in (\ref{['fig:main_theorem']}) with similar plots for IWMV in (\ref{['fig:iwmv']}) and for IAA in (\ref{['fig:iaa']}). Heatmap in (\ref{['fig:heatmap']}) is of MV vs oMAP plot in (\ref{['fig:main_theorem']}). Curves in (\ref{['fig:infinity']}) show how that probability gap falls as $H$ grows for four distinct parameter settings of which only orange one satisfies optimality condition for MV. Histogram (\ref{['fig:histogram']}) shows the percentage of labels aggregated using MV equal to the the ones aggregated using oMAP.
  • Figure 4: Non-red bars show the fraction of experiments where verification of Theorem \ref{['thm:twocoins']} with estimated parameters from the candidate methods aligns with that of Theorem \ref{['thm:twocoins']} using the true $(\nu,T)$, considering cases where the theorem is verified with true parameters. Red bars indicate cases where Theorem \ref{['th:estimated_quantities']} aligns with Theorem \ref{['thm:twocoins']} using true parameters. Synthetic data have various sample sizes $N$, and the average True Positive Rate is plotted over multiple $T$ values, with $\nu_0=0.5$.
  • Figure 5: Confusion matrices describing the performance of the empirical method comparing it with the oracle results. Different $T$ matrices and class distributions $\nu$ are used to perform the experiments. These results are based on $H=3$ and $N=10^6$. With this value of $N$, the empirical approach has good performance with the EBCC method, which slightly decrease with the IAA estimation approach.
  • ...and 2 more figures

Theorems & Definitions (34)

  • Proposition 3.1
  • Lemma 3.1: Noise transition matrix $T^\textrm{MV}$, also Lemma 2.1 wei2023aggregate
  • Lemma 3.2: Noise transition matrix $T^\textrm{oMAP}$
  • Theorem 3.3: MV optimality criterion for one-parameter $T$ for binary tasks
  • Theorem 3.4: MV optimality criterion for two-parameter $T$ for binary tasks
  • Theorem 3.5
  • Lemma A.1
  • Remark
  • Theorem A.2
  • proof
  • ...and 24 more