The Counterfactual Combine: A Causal Framework for Player Evaluation

Herbert P. Susmann; Antonio D'Alessandro

The Counterfactual Combine: A Causal Framework for Player Evaluation

Herbert P. Susmann, Antonio D'Alessandro

TL;DR

This work casts sports player evaluation within a rigorous causal inference framework and defines a flexible class of causal player evaluation estimands, developing doubly robust estimators for these evaluation metrics based on modern semiparametric statistical methods, with a focus on Targeted Minimum Loss-based Estimation.

Abstract

Evaluating sports players based on their performance shares core challenges with evaluating healthcare providers based on patient outcomes. Drawing on recent advances in healthcare provider profiling, we cast sports player evaluation within a rigorous causal inference framework and define a flexible class of causal player evaluation estimands. Using stochastic interventions, we compare player success rates on repeated tasks (such as field goal attempts or plate appearance) to counterfactual success rates had those same attempts been randomly reassigned to players according to prespecified reference distributions. This setup encompasses direct and indirect standardization parameters familiar from healthcare provider profiling, and we additionally propose a "performance above random replacement" estimand designed for interpretability in sports settings. We develop doubly robust estimators for these evaluation metrics based on modern semiparametric statistical methods, with a focus on Targeted Minimum Loss-based Estimation, and incorporate machine learning methods to capture complex relationships driving player performance. We illustrate our framework in detailed case studies of field goal kickers in the National Football League and batters in Major League Baseball, highlighting how different causal estimands yield distinct interpretations and insights about player performance.

The Counterfactual Combine: A Causal Framework for Player Evaluation

TL;DR

Abstract

Paper Structure (19 sections, 2 theorems, 26 equations, 8 figures)

This paper contains 19 sections, 2 theorems, 26 equations, 8 figures.

Keywords:
Introduction
Prior work
Contributions
Causal Evaluation Metrics
Efficiency Theory
Estimation
Substitution Estimators
Targeted Minimum Loss-based Estimation
Case Studies
NFL Kickers
MLB Batters
Discussion
Identification proof
Performance above random replacement
...and 4 more sections

Key Result

Theorem 1

Let $\| \cdot \|$ denote the $L_2(\mathsf{P})$ norm, and $\hat{\psi}_a^{\mathsf{rand}}$ be the TMLE estimate of $\psi_{a}^{\mathsf{rand}}$. Assume there exists $\epsilon > 0$ such that $\pi(a' \mid X) > \epsilon$ for all $a' \in \mathcal{A}$, $\mathsf{P}$-almost surely. Suppose that either $\| \hat{ Suppose that both $\| \hat{\pi} - \pi \| = o_{\mathsf{P}}(n^{-1/4})$ and $\| \hat{\mu} - \mu \| = o

Figures (8)

Figure 1: Directly standardized field goal success rates for each NFL kicker in the analysis dataset. The red points indicate the point estimate and the horizontal lines depict the 95% confidence interval. The blue points show the empirical field goal success rates. Kickers are arranged in descending order of their point estimate.
Figure 2: Funnel plots visualizing the estimated indirect standardization (left) and random replacement (right) metrics \ref{['eq:indirect-metrics']} for each NFL kicker in the case study. The $x$-axis shows the statistical precision of the point estimate, and the $y$-axis is the point estimate. Dashed lines indicate control limits at the 97.5%, 99%, and 99.9% confidence levels.
Figure 3: Comparison of the cross-fitted propensity score estimates for each field goal attempt in the analysis dataset for S. Gostkowski, P. Dawson, and T. Bass. Each point represents a single field goal attempt. Attempts performed on a grass surface are shown in blue, and attempts on artificial turf are shown in red. The linear best-fit line is included to emphasize the correlation between propensity score estimates for each player.
Figure 4: Cluster dendrogram of NFL kickers based on the Euclidean distance between their normalized propensity score estimates.
Figure 5: Funnel plots visualizing the estimated indirect standardization (left) and random replacement (right) metrics for each MLB batter in the case study. The $x$-axis shows the statistical precision of the point estimate, and the $y$-axis is the point estimate. Dashed lines indicate control limits at the 97.5%, 99% and 99.9% confidence levels.
...and 3 more figures

Theorems & Definitions (3)

Theorem 1: Consistency and asymptotic normality
Theorem 2: von Mises expansion
proof

The Counterfactual Combine: A Causal Framework for Player Evaluation

TL;DR

Abstract

The Counterfactual Combine: A Causal Framework for Player Evaluation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)