Table of Contents
Fetching ...

Learning when to rank: Estimation of partial rankings from sparse, noisy comparisons

Sebastian Morel-Balbi, Alec Kirkley

TL;DR

The paper tackles ranking from sparse, noisy pairwise comparisons by introducing a principled Bayesian nonparametric framework that allows partial rankings (ties). It couples a flexible likelihood (BT-based) with a hierarchical prior over rank partitions, and implements a fast agglomerative MAP algorithm to infer partial rankings that are more parsimonious than full rankings when data are limited. Across synthetic and real datasets, the approach reveals regimes where partial rankings are advantageous and demonstrates its utility in a case study of CS faculty hiring, where elite-tier structures and mobility patterns emerge. The work provides a scalable alternative to full ranking models, with potential extensions to uncertainty quantification, dynamic rankings, and edge-structure modeling.

Abstract

Ranking items based on pairwise comparisons is common, from using match outcomes to rank sports teams to using purchase or survey data to rank consumer products. Statistical inference-based methods such as the Bradley-Terry model, which extract rankings based on an underlying generative model, have emerged as flexible and powerful tools to tackle ranking in empirical data. In situations with limited and/or noisy comparisons, it is often challenging to confidently distinguish the performance of different items based on the evidence available in the data. However, most inference-based ranking methods choose to assign each item to a unique rank or score, suggesting a meaningful distinction when there is none. Here, we develop a principled nonparametric Bayesian method, adaptable to any statistical ranking method, for learning partial rankings (rankings with ties) that distinguishes among the ranks of different items only when there is sufficient evidence available in the data. We develop a fast agglomerative algorithm to perform Maximum A Posteriori (MAP) inference of partial rankings under our framework and examine the performance of our method on a variety of real and synthetic network datasets, finding that it frequently gives a more parsimonious summary of the data than traditional ranking, particularly when observations are sparse.

Learning when to rank: Estimation of partial rankings from sparse, noisy comparisons

TL;DR

The paper tackles ranking from sparse, noisy pairwise comparisons by introducing a principled Bayesian nonparametric framework that allows partial rankings (ties). It couples a flexible likelihood (BT-based) with a hierarchical prior over rank partitions, and implements a fast agglomerative MAP algorithm to infer partial rankings that are more parsimonious than full rankings when data are limited. Across synthetic and real datasets, the approach reveals regimes where partial rankings are advantageous and demonstrates its utility in a case study of CS faculty hiring, where elite-tier structures and mobility patterns emerge. The work provides a scalable alternative to full ranking models, with potential extensions to uncertainty quantification, dynamic rankings, and edge-structure modeling.

Abstract

Ranking items based on pairwise comparisons is common, from using match outcomes to rank sports teams to using purchase or survey data to rank consumer products. Statistical inference-based methods such as the Bradley-Terry model, which extract rankings based on an underlying generative model, have emerged as flexible and powerful tools to tackle ranking in empirical data. In situations with limited and/or noisy comparisons, it is often challenging to confidently distinguish the performance of different items based on the evidence available in the data. However, most inference-based ranking methods choose to assign each item to a unique rank or score, suggesting a meaningful distinction when there is none. Here, we develop a principled nonparametric Bayesian method, adaptable to any statistical ranking method, for learning partial rankings (rankings with ties) that distinguishes among the ranks of different items only when there is sufficient evidence available in the data. We develop a fast agglomerative algorithm to perform Maximum A Posteriori (MAP) inference of partial rankings under our framework and examine the performance of our method on a variety of real and synthetic network datasets, finding that it frequently gives a more parsimonious summary of the data than traditional ranking, particularly when observations are sparse.
Paper Structure (22 sections, 2 theorems, 55 equations, 22 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 2 theorems, 55 equations, 22 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Let $N \in \mathbb{Z}^+$ and let $\{n_r\} \in \mathbb{Z}^+$ be a set of positive integers such that $\sum_{r = 1}^R n_r = N$. Then the multinomial coefficient $\binom{N}{n_1 \ldots n_R}$ is maximized when $n_r = n = N/R~~\forall r$.

Figures (22)

  • Figure 1: Partial rankings in a small example network.(a) Rankings inferred by the BT model and (b) the PR algorithm, for a dataset capturing hierarchical relationships in a pack of wolves. The nodes are labeled according to their inferred ranking. The distance along the y-axis and the node colors are proportional to the inferred strength $\pi_i$ of each node $i$, with the strongest nodes placed at the top. In this case, there was not enough statistical evidence in the edges to justify separating the node ranks $\{2,3\}$, $\{6,7,8,9\}$, and $\{11,12\}$ on the left hand side, so the partial ranking method grouped these nodes together into the same partial rankings.
  • Figure 2: (a) Heatmap of the number of rankings $R$ inferred by the partial rankings model, as a function of $(\sigma, \langle m\rangle)$ for a dataset of $N=50$ nodes planted into $R = 3$ rankings as described by the synthetic model in Sec. \ref{['sec:synthetic']}. (b) Heatmap of the log posterior odds ratio (Eq. \ref{['eq:PORbtpr']}) between the BT and the partial rankings model across the simulations. Positive values indicate a preference for the partial rankings model, and negative values a preference for the BT model.
  • Figure 3: (a) Average number of rankings inferred by the best-performing model (blue) and our partial rankings model (red) as a function of the average number of matches per player pair $\langle m \rangle$ for the case in which no planted partial rankings are present. (b) Log-posterior-odds ratio (normalized per node) between the two models. Negative values of this difference indicate a preference for the BT model and positive values a preference for the partial rankings model. All results were obtained by averaging over $20$ different simulations from the synthetic network model of Sec. \ref{['sec:synthetic']}, and error bars indicate $2$ standard errors from the mean. Except near the transition point, error bars are smaller than the marker size.
  • Figure 4: Partial rankings in real networks of pairwise comparisons.(a) (top) Rescaled number of ranks per node inferred by the best-performing model and (bottom) log posterior odds ratio per node, both as a function of $\langle m\rangle$ for all real-world networks considered in the study (see Table \ref{['tab:datasets']}). The colours of the points indicate the different categories to which the datasets belong, while the shape of the markers indicates which model (PR or BT) emerged as the best-performing method (in terms of the posterior odds ratio). Positive values of the log posterior odds ratio indicate evidence in favor of the PR model; negative values indicate evidence in favor of the BT model. (b) Effective number of ranks, $R^*$ (Eq. \ref{['eq:R_eff']}), as a function of the number of unique ranks $R$ inferred by the PR algorithm. The black dashed line represents the line $R^* = R$. (c) Number of rankings inferred via the partial rankings algorithm as a function of the number of rankings inferred via mean shift clustering. Point colors indicate the $\tau_B$ score between the two inferred rankings. The black dashed line represents the line $R_{PR} = R_{MS}$. (d) Number of nodes of each network as a function of the number of rankings $R$ inferred by the partial rankings algorithm. Point colors indicate the $\tau_B$ score between the rankings inferred via the PR algorithm and those inferred by the BT model. The dashed black line indicates the line $R = N$.
  • Figure 5: Partial rankings of CS departments according to faculty hiring patterns.(a) Barplot of BT strengths ($y$-axis) for the 205 PhD-granting institutions included in the dataset, ordered according to their inferred BT rankings ($x$-axis). Colors indicate the PR membership of each institution. A legend displaying the numerical values of these PR strengths is shown in panel (b). The names of the five institutions making up the strongest PR cluster and of the first five institutions in the second-strongest PR cluster (ordered in terms of their BT rank) are shown in the figure. (b) Barplot depicting the PR groups (colored as before) with respect to the USN ranking ($x$-axis) instead of BT rank, for the 153 institutions for which USN data was available. Some notable inconsistencies between the USN rankings of some institutions and those inferred by pairwise comparison methods are shown in the figure. The inverse of USN ranking was plotted along the $y$-axis to provide an analogue to the BT score for these pre-determined rankings for easier visualization.
  • ...and 17 more figures

Theorems & Definitions (4)

  • Proposition 1
  • proof
  • Proposition 2
  • proof