Minimax Hypothesis Testing for the Bradley-Terry-Luce Model
Anuran Makur, Japneet Singh
TL;DR
This work develops a minimax hypothesis-testing framework to decide whether observed pairwise comparison data on a fixed graph can be generated by a Bradley–Terry–Luce (BTL) model. It introduces a separation-distance measure based on the canonical Markov representation and proposes a test statistic that captures deviation from BTL, providing upper and lower bounds on the critical threshold. The analysis demonstrates minimax-optimal scaling in the complete-graph regime (εc^2 ∼ Θ(1/(nk))) and extends guarantees to graph classes with bounded principal ratio through expansion arguments, including complete, dense regular, and Erdős–Rényi graphs. Empirical validation with synthetic and real-world datasets, along with a data-driven permutation test for threshold selection, confirms the practical viability and robustness of the approach, while highlighting the role of graph properties in statistical power and stability of rankings under model misspecification.
Abstract
The Bradley-Terry-Luce (BTL) model is one of the most widely used models for ranking a collection of items or agents based on pairwise comparisons among them. Given $n$ agents, the BTL model endows each agent $i$ with a latent skill score $α_i > 0$ and posits that the probability that agent $i$ is preferred over agent $j$ is $α_i/(α_i + α_j)$. In this work, our objective is to formulate a hypothesis test that determines whether a given pairwise comparison dataset, with $k$ comparisons per pair of agents, originates from an underlying BTL model. We formalize this testing problem in the minimax sense and define the critical threshold of the problem. We then establish upper bounds on the critical threshold for general induced observation graphs (satisfying mild assumptions) and develop lower bounds for complete induced graphs. Our bounds demonstrate that for complete induced graphs, the critical threshold scales as $Θ((nk)^{-1/2})$ in a minimax sense. In particular, our test statistic for the upper bounds is based on a new approximation we derive for the separation distance between general pairwise comparison models and the class of BTL models. To further assess the performance of our statistical test, we prove upper bounds on the type I and type II probabilities of error. Much of our analysis is conducted within the context of a fixed observation graph structure, where the graph possesses certain ``nice'' properties, such as expansion and bounded principal ratio. Additionally, we derive several auxiliary results, such as bounds on principal ratios of graphs, $\ell^2$-bounds on BTL parameter estimation under model mismatch, stability of rankings under the BTL model, etc. We validate our theoretical results through experiments on synthetic and real-world datasets and propose a data-driven permutation testing approach to determine test thresholds.
