Algorithmic Contract Design for Crowdsourced Ranking
Kiriaki Frangias, Andrew Lin, Ellen Vitercik, Manolis Zampetakis
TL;DR
This paper tackles ranking a set of items using crowdsourced pairwise comparisons from strategic agents. It introduces a contract-theory–based mechanism where the principal assigns comparisons, verifies a small subset, and pays based on verified accuracy, achieving $O(\log s)$ verifications and improved utility over solo ranking. A key technical advance is distributing comparisons via a connection to the social golfer problem, enabling per-agent workload $\tilde{O}(n^{3/2}/s)$ and exactly $r$ agents per pair, while CrowdSort identifies good agents and aggregates their inputs to recover the ground-truth ordering with high probability. Empirical results demonstrate robustness to model misspecification and show that the mechanism can outperform self-ranking under plausible disutility and reliability settings, offering scalable, incentive-compatible crowdsourced ranking for applications in search, language model feedback, and peer grading.
Abstract
Ranking is fundamental to many areas, such as search engine optimization, human feedback for language models, as well as peer grading. Crowdsourcing, which is often used for these tasks, requires proper incentivization to ensure accurate inputs. In this work, we draw on the field of \emph{contract theory} from Economics to propose a novel mechanism that enables a \emph{principal} to accurately rank a set of items by incentivizing agents to provide pairwise comparisons of the items. Our mechanism implements these incentives by verifying a subset of each agent's comparisons, a task we assume to be costly. The agent is compensated (for example, monetarily or with class credit) based on the accuracy of these comparisons. Our mechanism achieves the following guarantees: (1) it only requires the principal to verify $O(\log s)$ comparisons, where $s$ is the total number of agents, and (2) it provably achieves higher total utility for the principal compared to ranking the items herself with no crowdsourcing.
