Inference of rankings planted in random tournaments
Dmitriy Kunisky, Daniel A. Spielman, Xifan Yu
TL;DR
This work analyzes inferring a hidden ranking from noisy pairwise comparisons encoded as a random tournament with edge directions biased by a hidden permutation $\pi$ and signal $\gamma$. It establishes sharp detection and recovery thresholds: strong detection occurs at $\gamma=\omega(n^{-3/4})$ and strong recovery at $\gamma=\omega(n^{-1/2})$, revealing a detection-recovery gap, and shows a simple Ranking By Wins algorithm achieves optimal or near-optimal performance in recovery and alignment for the planted model. The authors connect alignment maximization to maximum likelihood estimation, showing that the MLE corresponds to the alignment objective but is NP-hard in general, while Ranking By Wins provides a near-ML solution in the planted regime and achieves a $(1-o(1))$-approximation to maximum alignment for $\gamma=\omega(n^{-1/2})$. Methodologically, the paper develops low-degree polynomial detectors, spectral comparisons, Fourier-analytic tools, and Berry–Esseen style concentration to rigorously bound detection and recovery, complemented by information-theoretic lower bounds via KL divergences. The results deepen understanding of when efficient algorithms can reliably recover latent rankings from noisy comparisons and illustrate a clear detection-recovery separation in high-dimensional statistics for ranking problems.
Abstract
We consider the problem of inferring an unknown ranking of $n$ items from a random tournament on $n$ vertices whose edge directions are correlated with the ranking. We establish, in terms of the strength of these correlations, the computational and statistical thresholds for detection (deciding whether an observed tournament is purely random or drawn correlated with a hidden ranking) and recovery (estimating the hidden ranking with small error in Spearman's footrule or Kendall's tau metric on permutations). Notably, we find that this problem provides a new instance of a detection-recovery gap: solving the detection problem requires much weaker correlations than solving the recovery problem. In establishing these thresholds, we also identify simple algorithms for detection (thresholding a degree 2 polynomial) and recovery (outputting a ranking by the number of "wins" of a tournament vertex, i.e., the out-degree) that achieve optimal performance up to constants in the correlation strength. For detection, we find that the above low-degree polynomial algorithm is superior to a natural spectral algorithm. We also find that, whenever it is possible to achieve strong recovery (i.e., to estimate with vanishing error in the above metrics) of the hidden ranking, then the above "Ranking By Wins" algorithm not only does so, but also outputs a close approximation of the maximum likelihood estimator, a task that is NP-hard in the worst case.
