Anyone for chess? Analysing chess ratings above high thresholds
Nils Lid Hjort
TL;DR
The paper develops a two-parameter threshold-tail model for extreme chess ratings, capturing the upper tail with density $f(x;a,\theta)$ and representing $X$ as $r_0+\theta V_a^a$ with $V_a\sim\Gamma(a,1)$. It derives moment and maxima properties, including $E M_n \doteq r_0 + \theta (\log n)^a$ and a Gumbel limit under normalization, and provides ML-based estimation, including options when only top-$k$ data are available. Applied to FIDE data above $2100$ in January 2026, the model fits well for both men and women, with similar tail index $a$ but larger male tail scale $\theta$, yielding a fatter male upper tail and a significant top-level gender gap confirmed by bootstrap tests. The work offers a practical framework for monitoring progress, predicting tournament outcomes from high-score data, and extending extreme-value analyses to other domains with high-threshold performance data.
Abstract
Suppose some cleverness score parameter is sufficiently interesting to be defined and then measured, perhaps for different strata of specialists or for the broader population. Such phenomena could have Gaussian distributions, when it comes to all players in a stratum, but when interest focuses on the very tails, for the top few percent, those above certain high thresholds, different models are called for, along with the need to analyse such based on the listed top scores only. In this note I develop such models and tools, and apply them to the top-100 and above 2100 points lists for regular chess ratings, for the currently active 14671 men and 753 women, as given by the FIDE, January 2026. It is argued that even when two or more distributions have close to identical expected values, or medians, even smaller differences in variance may explain gaps for the few very best ones.
