Table of Contents
Fetching ...

Minimax Hypothesis Testing for the Bradley-Terry-Luce Model

Anuran Makur, Japneet Singh

TL;DR

This work develops a minimax hypothesis-testing framework to decide whether observed pairwise comparison data on a fixed graph can be generated by a Bradley–Terry–Luce (BTL) model. It introduces a separation-distance measure based on the canonical Markov representation and proposes a test statistic that captures deviation from BTL, providing upper and lower bounds on the critical threshold. The analysis demonstrates minimax-optimal scaling in the complete-graph regime (εc^2 ∼ Θ(1/(nk))) and extends guarantees to graph classes with bounded principal ratio through expansion arguments, including complete, dense regular, and Erdős–Rényi graphs. Empirical validation with synthetic and real-world datasets, along with a data-driven permutation test for threshold selection, confirms the practical viability and robustness of the approach, while highlighting the role of graph properties in statistical power and stability of rankings under model misspecification.

Abstract

The Bradley-Terry-Luce (BTL) model is one of the most widely used models for ranking a collection of items or agents based on pairwise comparisons among them. Given $n$ agents, the BTL model endows each agent $i$ with a latent skill score $α_i > 0$ and posits that the probability that agent $i$ is preferred over agent $j$ is $α_i/(α_i + α_j)$. In this work, our objective is to formulate a hypothesis test that determines whether a given pairwise comparison dataset, with $k$ comparisons per pair of agents, originates from an underlying BTL model. We formalize this testing problem in the minimax sense and define the critical threshold of the problem. We then establish upper bounds on the critical threshold for general induced observation graphs (satisfying mild assumptions) and develop lower bounds for complete induced graphs. Our bounds demonstrate that for complete induced graphs, the critical threshold scales as $Θ((nk)^{-1/2})$ in a minimax sense. In particular, our test statistic for the upper bounds is based on a new approximation we derive for the separation distance between general pairwise comparison models and the class of BTL models. To further assess the performance of our statistical test, we prove upper bounds on the type I and type II probabilities of error. Much of our analysis is conducted within the context of a fixed observation graph structure, where the graph possesses certain ``nice'' properties, such as expansion and bounded principal ratio. Additionally, we derive several auxiliary results, such as bounds on principal ratios of graphs, $\ell^2$-bounds on BTL parameter estimation under model mismatch, stability of rankings under the BTL model, etc. We validate our theoretical results through experiments on synthetic and real-world datasets and propose a data-driven permutation testing approach to determine test thresholds.

Minimax Hypothesis Testing for the Bradley-Terry-Luce Model

TL;DR

This work develops a minimax hypothesis-testing framework to decide whether observed pairwise comparison data on a fixed graph can be generated by a Bradley–Terry–Luce (BTL) model. It introduces a separation-distance measure based on the canonical Markov representation and proposes a test statistic that captures deviation from BTL, providing upper and lower bounds on the critical threshold. The analysis demonstrates minimax-optimal scaling in the complete-graph regime (εc^2 ∼ Θ(1/(nk))) and extends guarantees to graph classes with bounded principal ratio through expansion arguments, including complete, dense regular, and Erdős–Rényi graphs. Empirical validation with synthetic and real-world datasets, along with a data-driven permutation test for threshold selection, confirms the practical viability and robustness of the approach, while highlighting the role of graph properties in statistical power and stability of rankings under model misspecification.

Abstract

The Bradley-Terry-Luce (BTL) model is one of the most widely used models for ranking a collection of items or agents based on pairwise comparisons among them. Given agents, the BTL model endows each agent with a latent skill score and posits that the probability that agent is preferred over agent is . In this work, our objective is to formulate a hypothesis test that determines whether a given pairwise comparison dataset, with comparisons per pair of agents, originates from an underlying BTL model. We formalize this testing problem in the minimax sense and define the critical threshold of the problem. We then establish upper bounds on the critical threshold for general induced observation graphs (satisfying mild assumptions) and develop lower bounds for complete induced graphs. Our bounds demonstrate that for complete induced graphs, the critical threshold scales as in a minimax sense. In particular, our test statistic for the upper bounds is based on a new approximation we derive for the separation distance between general pairwise comparison models and the class of BTL models. To further assess the performance of our statistical test, we prove upper bounds on the type I and type II probabilities of error. Much of our analysis is conducted within the context of a fixed observation graph structure, where the graph possesses certain ``nice'' properties, such as expansion and bounded principal ratio. Additionally, we derive several auxiliary results, such as bounds on principal ratios of graphs, -bounds on BTL parameter estimation under model mismatch, stability of rankings under the BTL model, etc. We validate our theoretical results through experiments on synthetic and real-world datasets and propose a data-driven permutation testing approach to determine test thresholds.

Paper Structure

This paper contains 54 sections, 31 theorems, 287 equations, 5 figures.

Key Result

Proposition 1

For a symmetric comparison set ${\mathcal{E}}$, a pairwise comparison model $\{p_{ij} \in (0,1): (i,j) \in {\mathcal{E}}, \, i \neq j\}$ is a BTL model if and only if its canonical Markov matrix $S \in {\mathbb{R}}^{n \times n}$ is reversible and satisfies the translated skew-symmetry condition $p_{

Figures (5)

  • Figure 1: Illustration of the data transformation process to induce reversibility for a cycle of length four from $i \to j \to k \to l \to i$. The (forward) transition probability corresponding to data in the left is proportional to $p_{ij}\cdot p_{jk}\cdot p_{kl} \cdot p_{li}$. The (backward) transition probability corresponding to data in the right is proportional to $p_{il}\cdot p_{lk}\cdot p_{kj} \cdot p_{ji}$.
  • Figure 2: Plots 2a and 2b illustrate the empirical average of $n \cdot T$ under hypothesis $H_1$ and $\mathbbm{1}_{\hat{\mathcal{R}}_\mathsf{m}>1/2}$ for various values of $\eta$ and $n$.
  • Figure 3: Plot 3a illustrates estimated thresholds $\gamma_0, \gamma_1$ for various values of $n$ and $k$ for complete and Erdős-Rényi random graph. Plot 3b illustrates the behavior of estimated thresholds $\gamma_1$ for various values of $n$ and $k$ for complete and Erdős-Rényi random graph under hypothesis $H_1$. The shaded region highlights 90% confidence intervals of test statistic $T$.
  • Figure 4: Plots 4a and 4b illustrate the scaled test statistic $n \cdot T$ for the cricket ODI dataset and the NBA dataset. The thresholds computed using both the empirical-quantile approach and the permutation-based scheme are also reported for each dataset.
  • Figure 5: Plot of $\sqrt{n}\|\Pi P+ P\Pi - {\mathbf{1}}\pi^\mathrm{T} \|_\mathrm{F}$.

Theorems & Definitions (45)

  • Definition 1: Pairwise Comparison Model
  • Definition 2: Canonical Markov Matrix
  • Definition 3: BTL Model BradleyTerry1952Luce1959McFadden1973
  • Proposition 1: BTL Model and Reversibility
  • Definition 4: Edge Expansion of Non-negative Matrices MehtaSchulman
  • Definition 5: Principal Ratio EigenvectorsIrregularGraphs
  • Proposition 2: BTL Model Characterization
  • Proposition 3: Decomposition of Weighted Frobenius Norm
  • Theorem 1: Distance to Closest BTL Model
  • Theorem 2: Upper Bound on $\varepsilon_{\mathsf{c}}$
  • ...and 35 more