Luck, skill, and depth of competition in games and social hierarchies

Maximilian Jerdee; M. E. J. Newman

Luck, skill, and depth of competition in games and social hierarchies

Maximilian Jerdee, M. E. J. Newman

TL;DR

This work extends the classic Bradley–Terry ranking framework by introducing a luck parameter $α$ and a depth parameter $β$, forming a generalized score function $f_{αβ}(s)$. A Bayesian inference pipeline (with a Gaussian prior on scores) estimates $s_i$, $α$, and $β$ from pairwise outcomes and enables predictive evaluation against baselines such as standard BT and SpringRank. Across sports, games, and social hierarchies (humans and animals), the authors find sports to be shallow with limited luck, while social hierarchies are deeper and often exhibit nonzero luck, with animal hierarchies generally deeper than human ones. The model demonstrates superior predictive performance in cross-validation and provides interpretable metrics of competition structure, accompanied by an open-source software package for pairwise ranking.

Abstract

Patterns of wins and losses in pairwise contests, such as occur in sports and games, consumer research and paired comparison studies, and human and animal social hierarchies, are commonly analyzed using probabilistic models that allow one to quantify the strength of competitors or predict the outcome of future contests. Here we generalize this approach to incorporate two additional features: an element of randomness or luck that leads to upset wins, and a "depth of competition" variable that measures the complexity of a game or hierarchy. Fitting the resulting model to a large collection of data sets we estimate depth and luck in a range of games, sports, and social situations. In general, we find that social competition tends to be "deep," meaning it has a pronounced hierarchy with many distinct levels, but also that there is often a nonzero chance of an upset victory, meaning that dominance challenges can be won even by significant underdogs. Competition in sports and games, by contrast, tends to be shallow and in most cases there is little evidence of upset wins, beyond those already implied by the shallowness of the hierarchy.

Luck, skill, and depth of competition in games and social hierarchies

TL;DR

This work extends the classic Bradley–Terry ranking framework by introducing a luck parameter

and a depth parameter

, forming a generalized score function

. A Bayesian inference pipeline (with a Gaussian prior on scores) estimates

, and

from pairwise outcomes and enables predictive evaluation against baselines such as standard BT and SpringRank. Across sports, games, and social hierarchies (humans and animals), the authors find sports to be shallow with limited luck, while social hierarchies are deeper and often exhibit nonzero luck, with animal hierarchies generally deeper than human ones. The model demonstrates superior predictive performance in cross-validation and provides interpretable metrics of competition structure, accompanied by an open-source software package for pairwise ranking.

Abstract

Paper Structure (16 sections, 40 equations, 7 figures, 2 tables)

This paper contains 16 sections, 40 equations, 7 figures, 2 tables.

Introduction
The model
Extensions of the model
Upset wins and luck
Depth of competition
Combined model
Minimum violations ranking
Results
Predicting wins and losses
Conclusions
Data sets
Cross-validation
Point estimates of parameters
Other measures of depth
Depth as predictability
...and 1 more sections

Figures (7)

Figure 1: Score functions $f(s)$. (a) The bold curve represents the standard logistic function $f(s) = 1/(1+e^{-s})$ used in the Bradley-Terry model. The remaining curves show the function $f_\alpha$ of Eq. \ref{['eq:fu']} for increasing values of the luck parameter $\alpha$. (b) The score function $f_\beta$ of Eq. \ref{['eq:fbeta']} for different values of the depth of competition $\beta$, both greater than 1 (steeper) and less than 1 (shallower).
Figure 2: (a) Each cloud represents the posterior distribution $P(\alpha,\beta|\bm{A})$ of the luck and depth parameters for a single data set, calculated from the Monte Carlo sampled values of $\alpha$ and $\beta$ using a Gaussian kernel density estimate. The + signs indicate the expected values $\hat{\alpha},\hat{\beta}$ of the parameters for each data set. (b) Fitted functions $f_{\alpha\beta}(s)$ for a selection of the data sets. The bold curve in each case corresponds to the expected values $\hat{\alpha},\hat{\beta}$, while the other surrounding curves are for a selection of values sampled from the posterior distribution, to give an idea of the variation around the average.
Figure 3: Comparative performance of the model of this paper and a selection of competing models and methods, in the task of predicting the outcome of unobserved matches in a cross-validation experiment. Performance is measured in terms of the log-likelihood (base 2) of the actual outcomes of matches within the fitted model, which is also equal to minus the description length in bits required to transmit the win/loss data given the fitted model. Log-likelihoods are plotted relative to that of the standard Bradley-Terry model with a logistic prior (the horizontal dashed line). Error bars represent upper and lower quartiles over at least 50 random repetitions of the cross-validation procedure in each case. The arrows along the bottom of the plot indicate cases where the log-likelihood is outside the range of the plot.
Figure 4: Results from the same set of cross-validation tests shown in Fig. \ref{['fig:log-likelihood']}, but quantified using (a) accuracy and (b) log-posterior predictive probability, instead of log-likelihood. All results are measured relative to the Bradley-Terry model with a logistic prior, which is represented as the dashed horizontal line in each panel. Error bars represent upper and lower quartiles, estimated from at least 50 random repetitions of the cross-validation procedure in each case. The maximum likelihood and SpringRank models are not included in the lower comparison, since they are based on point estimates rather than Bayesian methods and hence one cannot calculate a posterior-predictive probability.
Figure 5: Absolute log-likelihood values per match in the cross-validation tests of Fig. \ref{['fig:log-likelihood']}. This figure differs from Fig. \ref{['fig:log-likelihood']} in showing absolute values rather than values relative to the Bradley-Terry model with logistic prior.
...and 2 more figures

Luck, skill, and depth of competition in games and social hierarchies

TL;DR

Abstract

Luck, skill, and depth of competition in games and social hierarchies

Authors

TL;DR

Abstract

Table of Contents

Figures (7)