Anyone for chess? Analysing chess ratings above high thresholds

Nils Lid Hjort

Anyone for chess? Analysing chess ratings above high thresholds

Nils Lid Hjort

TL;DR

The paper develops a two-parameter threshold-tail model for extreme chess ratings, capturing the upper tail with density $f(x;a,\theta)$ and representing $X$ as $r_0+\theta V_a^a$ with $V_a\sim\Gamma(a,1)$. It derives moment and maxima properties, including $E M_n \doteq r_0 + \theta (\log n)^a$ and a Gumbel limit under normalization, and provides ML-based estimation, including options when only top-$k$ data are available. Applied to FIDE data above $2100$ in January 2026, the model fits well for both men and women, with similar tail index $a$ but larger male tail scale $\theta$, yielding a fatter male upper tail and a significant top-level gender gap confirmed by bootstrap tests. The work offers a practical framework for monitoring progress, predicting tournament outcomes from high-score data, and extending extreme-value analyses to other domains with high-threshold performance data.

Abstract

Suppose some cleverness score parameter is sufficiently interesting to be defined and then measured, perhaps for different strata of specialists or for the broader population. Such phenomena could have Gaussian distributions, when it comes to all players in a stratum, but when interest focuses on the very tails, for the top few percent, those above certain high thresholds, different models are called for, along with the need to analyse such based on the listed top scores only. In this note I develop such models and tools, and apply them to the top-100 and above 2100 points lists for regular chess ratings, for the currently active 14671 men and 753 women, as given by the FIDE, January 2026. It is argued that even when two or more distributions have close to identical expected values, or medians, even smaller differences in variance may explain gaps for the few very best ones.

Anyone for chess? Analysing chess ratings above high thresholds

TL;DR

The paper develops a two-parameter threshold-tail model for extreme chess ratings, capturing the upper tail with density

and representing

with

. It derives moment and maxima properties, including

and a Gumbel limit under normalization, and provides ML-based estimation, including options when only top-

data are available. Applied to FIDE data above

in January 2026, the model fits well for both men and women, with similar tail index

but larger male tail scale

, yielding a fatter male upper tail and a significant top-level gender gap confirmed by bootstrap tests. The work offers a practical framework for monitoring progress, predicting tournament outcomes from high-score data, and extending extreme-value analyses to other domains with high-threshold performance data.

Abstract

Paper Structure (7 sections, 25 equations, 4 figures)

This paper contains 7 sections, 25 equations, 4 figures.

Introduction: the sizes of maxima
A parametric model for ratings over threshold
Properties.
Estimation and inference.
Chess ratings, for men and women, per January 2026
Assessing the gender gap
Concluding remarks

Figures (4)

Figure 1: Queen's Gambit.
Figure 2: Left panel: fitted densities $\widehat{f}_m$ and $\widehat{f}_w$, for men (full, black) and women (red, slanted) having rating above 2100; they are not far apart but the right tail is dominated by the men. Right panel: fitted c.d.f.s $\widehat{F}_m$ and $\widehat{F}_w$, along also with the empirical c.d.f.s (very slightly non-continuous, as dotted curves). The parametric and nonparametric are very close, i.e. the model used is fully adequate.
Figure 3: Left panel: the difference $S_m(x)-S_w(x)$, empirical (black, wiggly) and parametrically fitted (red, slanted, smooth), along with a 90 percent confidence band. Right panel: empirical and theoretical quantile functions $Q_m$ (black, full) and $Q_w$ (red, slanted). The model fit is excellent.
Figure 4: Left panel: the estimated log-densities $\log\widehat{f}_m$ and $\log\widehat{f}_w$, indicating that the man tail is fatter and longer than the woman tail; more top players are, indeed, men. Right panel: bootstrap points $(A^*,B^*)$ for 1,000 bootstrap null distribution datasets, along with the factual $(A_{\rm obs},B_{\rm obs})$, for $A$ the 0.90 quantile difference and $B$ the standard deviation difference.

Anyone for chess? Analysing chess ratings above high thresholds

TL;DR

Abstract

Anyone for chess? Analysing chess ratings above high thresholds

Authors

TL;DR

Abstract

Table of Contents

Figures (4)