Optimal level set estimation for non-parametric tournament and crowdsourcing problems
Maximilian Graf, Alexandra Carpentier, Nicolas Verzelen
TL;DR
The paper addresses optimal level-set estimation in non-parametric tournament and crowdsourcing settings where the data matrix $M$ is bi-isotonic up to row/column permutations. It introduces SoHLoB, a polynomial-time algorithm that localizes large entries via a hierarchical ranking framework built on envelopes and multiple noisy views, achieving minimax-optimal rates for the classification loss and permutation loss up to polylog factors. A key contribution is showing minimax lower bounds that match the algorithmic guarantees, and extending the approach to multiple thresholds and finite-valued matrices, thereby indicating no computational gap in these regimes. The work also connects to noisy-sorting literature and provides a detailed algorithmic and analytical blueprint (Envelope, ScanAndUpdate, hierarchical sorting tree) for efficient level-set recovery with practical impact on allocating workers to questions in crowdsourcing and ranking players in tournaments.
Abstract
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions. In this paper, we assume that both the experts and the questions can be ordered, namely that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns. When $n=d$, this also encompasses the strongly stochastic transitive (SST) model from the tournament literature. Here, we focus on the relevant problem of deciphering small entries of $M$ from large entries of $M$, which is key in crowdsourcing for efficient allocation of workers to questions. More precisely, we aim at recovering a (or several) level set $p$ of the matrix up to a precision $h$, namely recovering resp. the sets of positions $(i,j)$ in $M$ such that $M_{ij}>p+h$ and $M_{i,j}<p-h$. We consider, as a loss measure, the number of misclassified entries. As our main result, we construct an efficient polynomial-time algorithm that turns out to be minimax optimal for this classification problem. This heavily contrasts with existing literature in the SST model where, for the stronger reconstruction loss, statistical-computational gaps have been conjectured. More generally, this shades light on the nature of statistical-computational gaps for permutations models.
