Learning when to rank: Estimation of partial rankings from sparse, noisy comparisons
Sebastian Morel-Balbi, Alec Kirkley
TL;DR
The paper tackles ranking from sparse, noisy pairwise comparisons by introducing a principled Bayesian nonparametric framework that allows partial rankings (ties). It couples a flexible likelihood (BT-based) with a hierarchical prior over rank partitions, and implements a fast agglomerative MAP algorithm to infer partial rankings that are more parsimonious than full rankings when data are limited. Across synthetic and real datasets, the approach reveals regimes where partial rankings are advantageous and demonstrates its utility in a case study of CS faculty hiring, where elite-tier structures and mobility patterns emerge. The work provides a scalable alternative to full ranking models, with potential extensions to uncertainty quantification, dynamic rankings, and edge-structure modeling.
Abstract
Ranking items based on pairwise comparisons is common, from using match outcomes to rank sports teams to using purchase or survey data to rank consumer products. Statistical inference-based methods such as the Bradley-Terry model, which extract rankings based on an underlying generative model, have emerged as flexible and powerful tools to tackle ranking in empirical data. In situations with limited and/or noisy comparisons, it is often challenging to confidently distinguish the performance of different items based on the evidence available in the data. However, most inference-based ranking methods choose to assign each item to a unique rank or score, suggesting a meaningful distinction when there is none. Here, we develop a principled nonparametric Bayesian method, adaptable to any statistical ranking method, for learning partial rankings (rankings with ties) that distinguishes among the ranks of different items only when there is sufficient evidence available in the data. We develop a fast agglomerative algorithm to perform Maximum A Posteriori (MAP) inference of partial rankings under our framework and examine the performance of our method on a variety of real and synthetic network datasets, finding that it frequently gives a more parsimonious summary of the data than traditional ranking, particularly when observations are sparse.
