Spot Check Equivalence: an Interpretable Metric for Information Elicitation Mechanisms
Shengwei Xu, Yichi Zhang, Paul Resnick, Grant Schoenebeck
TL;DR
This paper introduces Spot Check Equivalence (SCE) as an interpretable metric to compare information elicitation mechanisms in crowdsourcing, unifying the effects of Measurement Integrity and Sensitivity on motivational proficiency. It formalizes the Information Elicitation Context, defines top-level components (Agent, Application, Mechanism), and provides two computable paths for SCE — with ground truth via MI/Sensitivity and without ground truth via a bootstrap-like approach. The authors prove a main unification result linking MI and Sensitivity under Gaussian-like assumptions and independence, and validate the framework through agent-based simulations across diverse task and payment settings, including spot-checking and various peer-prediction mechanisms. The work offers practical guidance for designing efficient incentive schemes and provides a rigorous, testable framework for evaluating the trade-offs between spot-checking and peer prediction in different data-elicitation contexts.
Abstract
Because high-quality data is like oxygen for AI systems, effectively eliciting information from crowdsourcing workers has become a first-order problem for developing high-performance machine learning algorithms. Two prevalent paradigms, spot-checking and peer prediction, enable the design of mechanisms to evaluate and incentivize high-quality data from human labelers. So far, at least three metrics have been proposed to compare the performances of these techniques [33, 8, 3]. However, different metrics lead to divergent and even contradictory results in various contexts. In this paper, we harmonize these divergent stories, showing that two of these metrics are actually the same within certain contexts and explain the divergence of the third. Moreover, we unify these different contexts by introducing \textit{Spot Check Equivalence}, which offers an interpretable metric for the effectiveness of a peer prediction mechanism. Finally, we present two approaches to compute spot check equivalence in various contexts, where simulation results verify the effectiveness of our proposed metric.
