Introducing Grid WAR: Rethinking WAR for Starting Pitchers

Ryan S. Brill; Abraham J. Wyner

Introducing Grid WAR: Rethinking WAR for Starting Pitchers

Ryan S. Brill, Abraham J. Wyner

TL;DR

The paper critiques standard WAR for starting pitchers as overly reliant on season-long averages and sequencing-agnostic metrics, proposing Grid WAR (GWAR) as a per-game, convex, context-neutral valuation. GWAR combines a grid-based context-neutral win probability $f(I,R)$ with a mid-inning continuation function $g(r|S,O)$, a replacement baseline $w_{rep}$, and park effects $oldsymbol{eta}^{(park)}$, with parameters estimated via a Poisson Empirical Bayes model and ridge park factors. Empirical results reveal that GWAR reweights pitcher contributions, tends to upweight high-variance performances, and provides better predictive validity for future GWAR than traditional FWAR-based estimates, supporting the view that game-by-game variance contains systematic signal. An online Shiny app at gridwar.xyz hosts per-game, per-season, and per-career GWAR results, offering a new, more nuanced lens on pitcher valuation and historical comparisons.

Abstract

The baseball statistic "Wins Above Replacement" (WAR) has emerged as one of the most popular evaluation metrics. But it is not readily observed and tabulated; WAR is an estimate of a parameter in a vaguely defined model with all its attendant assumptions. Industry-standard models of WAR for starting pitchers from FanGraphs and Baseball Reference all assume that season-long averages are sufficient statistics for a pitcher's performance. This provides an invalid mathematical foundation for many reasons, especially because WAR should not be linear with respect to any counting statistic. To repair this defect, as well as many others, we devise a new measure, Grid WAR, which accurately estimates a starting pitcher's WAR on a per-game basis. The convexity of Grid WAR diminishes the impact of "blow-up" games and upweights exceptional games, raising the valuation of pitchers like Sandy Koufax, Whitey Ford, and Catfish Hunter who exhibit fundamental game-by-game variance. Grid WAR is designed to accurately measure past performance, but also has predictive value insofar as a pitcher's Grid WAR is better than WAR at predicting future performance. Finally, at https://gridwar.xyz we host a Shiny app which displays the Grid WAR results of each MLB game since 1952, including career, season, and game level results, which updates automatically every morning.

Introducing Grid WAR: Rethinking WAR for Starting Pitchers

TL;DR

with a mid-inning continuation function

, a replacement baseline

, and park effects

, with parameters estimated via a Poisson Empirical Bayes model and ridge park factors. Empirical results reveal that GWAR reweights pitcher contributions, tends to upweight high-variance performances, and provides better predictive validity for future GWAR than traditional FWAR-based estimates, supporting the view that game-by-game variance contains systematic signal. An online Shiny app at gridwar.xyz hosts per-game, per-season, and per-career GWAR results, offering a new, more nuanced lens on pitcher valuation and historical comparisons.

Abstract

Paper Structure (30 sections, 64 equations, 30 figures, 9 tables, 2 algorithms)

This paper contains 30 sections, 64 equations, 30 figures, 9 tables, 2 algorithms.

Introduction
Why calculate $\text{WAR}$?
Standard $\text{WAR}$ calculations
Problems with standard $\text{WAR}$ calculations for starting pitchers
Paper organization
Defining Grid $\text{WAR}$ for starting pitchers
Grid $\text{WAR}$ formulation
Our Data
Estimating the grid function $f$
Estimating the grid function $g$
Estimating the constant $w_{\text{rep}}$
Estimating the park effects $\alpha$
Results
Averaging pitcher performance across games dilutes the contributions of his great games
Grid $\text{WAR}$ has predictive value
...and 15 more sections

Figures (30)

Figure 1: Context-neutral win probability ($y$-axis) if a starter allowed $R$ runs ($x$-axis) through $I$ complete innings (color) according to the 2019 National League grid function $f$, fit from our Poisson model \ref{['eqn:Apoisson_model_2post']} with positive Normal prior \ref{['eqn:Apoisson_model_2prior_tuned']}.
Figure 2: From base-state $S$ (color) and $O=0$ outs through the end of an inning, the context-neutral probability ($y$-axis) that the pitcher allows $R$ runs ($x$-axis) according to the grid function $g$.
Figure 3: Our 2019 three-year park effects ($x$-axis), fit from half-inning data from 2017 to 2019, for each ballpark ($y$-axis). The abbreviations are Retrosheet ballpark codes.
Figure 4: Grid $\text{WAR}$ ($y$-axis) versus FanGraphs $\text{WAR}$ ($\text{RA}/9$) ($x$-axis) for each pitcher-season in 2019. The pitcher name refers to the dot on its immediate left.
Figure 5: Histogram of runs allowed in a game in 2019 for Homer Bailey (left), Tanner Roark (middle), and the difference between these two histograms (right). Even though they have the same $\text{FWAR}$, Bailey has a higher $\text{GWAR}$ than Roark because he has more games in which he allows fewer runs.
...and 25 more figures

Introducing Grid WAR: Rethinking WAR for Starting Pitchers

TL;DR

Abstract

Introducing Grid WAR: Rethinking WAR for Starting Pitchers

Authors

TL;DR

Abstract

Table of Contents

Figures (30)