Table of Contents
Fetching ...

New insights into Elo algorithm for practitioners and statisticians

Leszek Szczecinski

Abstract

This work reconciles two perspectives on the Elo ranking that coexist in the literature: the practitioner's view as a heuristic feedback rule, and the statistician's view as online maximum likelihood estimation via stochastic gradient ascent. Both perspectives coincide exactly in the binary case (iff the expected score is the logistic function). However, estimation noise forces a principled decoupling between the model used for ranking and the model used for prediction: the effective scale and home-field advantage parameter must be adjusted to account for the noise. We provide both closed-form corrections and a data-driven identification procedure. For multilevel outcomes, an exact relationship exists when outcome scores are uniformly spaced, but approximations are preferred in general: they account for estimation noise and better fit the data. The decoupled approach substantially outperforms the conventional one that reuses the ranking model for prediction, and serves as a diagnostic of convergence status. Applied to six years of FIFA men's ranking, we find that the ranking had not converged for the vast majority of national teams. The paper is written in a semi-tutorial style accessible to practitioners, with all key results accompanied by closed-form expressions and numerical examples.

New insights into Elo algorithm for practitioners and statisticians

Abstract

This work reconciles two perspectives on the Elo ranking that coexist in the literature: the practitioner's view as a heuristic feedback rule, and the statistician's view as online maximum likelihood estimation via stochastic gradient ascent. Both perspectives coincide exactly in the binary case (iff the expected score is the logistic function). However, estimation noise forces a principled decoupling between the model used for ranking and the model used for prediction: the effective scale and home-field advantage parameter must be adjusted to account for the noise. We provide both closed-form corrections and a data-driven identification procedure. For multilevel outcomes, an exact relationship exists when outcome scores are uniformly spaced, but approximations are preferred in general: they account for estimation noise and better fit the data. The decoupled approach substantially outperforms the conventional one that reuses the ranking model for prediction, and serves as a diagnostic of convergence status. Applied to six years of FIFA men's ranking, we find that the ranking had not converged for the vast majority of national teams. The paper is written in a semi-tutorial style accessible to practitioners, with all key results accompanied by closed-form expressions and numerical examples.

Paper Structure

This paper contains 23 sections, 2 theorems, 85 equations, 5 figures, 3 tables.

Key Result

Proposition 1

The model Py.z.identified may be treated as a probability $\Pr\left\{Y_t=y|z\right\}$, if and only if Proof: For Py.z.identified to satisfy the law of total probability, the following must hold: Since of sum.proba=1.2 is a polynomial, must also be a polynomial of the same order. This requires $b=L-1$, which yields b.from.L and, from a binomial expansion $(1+\mathrm{e}^z)^{L-1}=\sum_{y=0}^{L-1}{{

Figures (5)

  • Figure 1: Trajectories of the estimated skills obtained using the Elo algorithm with random step $K_t\in\{10,20,30\}$ (lower, blue) or with fixed $K=60$ (upper, green). Done for a given $\boldsymbol{\theta}^*$, each curve corresponds to a different realization of $y_t$ and $\boldsymbol{x}_t$. Markers indicate the empirical average (across realizations) and the dashed red lines indicate the 68% credible interval at convergence ($\mathds{E}[\theta_{\infty,m}]\pm \sqrt{\overline{v}}$).
  • Figure 2: Conditional probability functions $\mathsf{P}_y(z/s+\eta)$\ref{['AC.model']} defining the model with $\boldsymbol{\alpha}$ and $\boldsymbol{\delta}$ given in \ref{['alpha.example.L=5']} and \ref{['delta.example.L=5']}, $s=174$, and $\eta=0.8$. The solid thick line denotes the expected value of the score, $G(z/s+\eta)$, given in \ref{['G(z).AC']} and solid dashed line denotes the approximation of the latter using a canonical function $\mathcal{L}(z/\tilde{s}+\tilde{\eta})$ with $\tilde{s}$ and $\tilde{\eta}$ in \ref{['from.s.tilde.to.s']} and \ref{['tilde.eta.from.eta']}.
  • Figure 3: Comparison between $G(z/s)$ and its approximation $\mathcal{L}(z/\tilde{s})$, for $L=3$, $\tilde{s}=s\beta_{{\textnormal{AC}}\rightarrow\mathcal{L}}$, $\beta_{{\textnormal{AC}}\rightarrow\mathcal{L}}$ given in \ref{['beta.from.kappa_1']}, $\boldsymbol{\delta}=[0,0.5,1]$, $\boldsymbol{\alpha}=[0,\alpha_1,0]$, where the values of $\alpha_1$ are given in the legend; $s= 174$. For smaller values of $z$, the curves practically superimpose. For $\alpha_1=\log 2\approx 0.7$, we have a true equivalence of the expected scores, i.e., $G(z/s)=\mathcal{L}(z/\tilde{s})$, where $\tilde{s}=2s$.
  • Figure 4: Skills (left axis) of the team, from the best to the worst, (thin lines and three-letters abbreviations) and $\beta_t$ (dashed thick line) estimated via \ref{['hat.beta.mini.batch.update']} (right scale). The shaded regions correspond to the training/testing sets (blue/rose, respectively).
  • Figure 5: Percentage of international teams which have played at least $\Lambda_m=\Lambda$ [time constants] matches by the year 2020, 2022, and 2024. The vertical lines indicate number of matches equal to one and two average time constants $\overline{\tau}_m$, were $\Lambda\approx 2$ is required to reliably declare convergence in the mean.

Theorems & Definitions (7)

  • Example 1: Illustration of convergence
  • Example 2: Fit to the estimated data
  • Proposition 1
  • Proposition 2
  • Example 3: $L=3$
  • Example 4: Pseudo-model identification
  • Example 5: FIFA Men's ranking