Skill Issues: An Analysis of CS:GO Skill Rating Systems

Mikel Bober-Irizar; Naunidh Dua; Max McGuinness

Skill Issues: An Analysis of CS:GO Skill Rating Systems

Mikel Bober-Irizar, Naunidh Dua, Max McGuinness

TL;DR

This paper performs an empirical analysis of Elo, Glicko2 and TrueSkill through the lens of surrogate modelling, where skill ratings influence future matchmaking with a configurable acquisition function.

Abstract

The meteoric rise of online games has created a need for accurate skill rating systems for tracking improvement and fair matchmaking. Although many skill rating systems are deployed, with various theoretical foundations, less work has been done at analysing the real-world performance of these algorithms. In this paper, we perform an empirical analysis of Elo, Glicko2 and TrueSkill through the lens of surrogate modelling, where skill ratings influence future matchmaking with a configurable acquisition function. We look both at overall performance and data efficiency, and perform a sensitivity analysis based on a large dataset of Counter-Strike: Global Offensive matches.

Skill Issues: An Analysis of CS:GO Skill Rating Systems

TL;DR

Abstract

Paper Structure (29 sections, 17 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 29 sections, 17 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Related Work
Methods
Framework
Simulator
Emulators
WinRate
Elo
Glicko2
TrueSkill
TrueSkillPlayers
Acquisition Functions (AFs)
Expected Improvement
Cheater's AF
Gaussian Process
...and 14 more sections

Figures (3)

Figure 1: The architecture of the skillbench library. Emulators and their Acquisition Functions are implemented as modular components following a common interface. A Simulator takes an Emulator, trains it on a training MatchDataset, and evaluates it on an evaluation MatchDataset.
Figure 2: Sensitivity analysis for our TrueSkill emulators, after 2000 training matches selected by the LikeliestDraw (\ref{['eq:AFdraw']}) acquisition function. The red $\times$ shows the default TrueSkill parameters, with the plot being $\pm 1$ order of magnitude. Note that the scale in (a) is a much larger range than in (b).
Figure 3: The training and evaluation accuracy across emulators, using a Random and Weighted \ref{['eq:AFweighted']} AF. As the dataset is exhausted, both acquisition functions train on all matches, in a different order. Error bars show $\pm 1 \sigma$ of aleatoric uncertainty; the variance between individual runs of the Simulator.

Skill Issues: An Analysis of CS:GO Skill Rating Systems

TL;DR

Abstract

Skill Issues: An Analysis of CS:GO Skill Rating Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (3)