Table of Contents
Fetching ...

Applications of Improvements to the Pythagorean Won-Loss Expectation in Optimizing Rosters

Alexander F. Almeida, Kevin Dayaratna, Steven J. Miller, Andrew K. Yang

TL;DR

The paper extends the classic Pythagorean Won-Lost framework by allowing runs scored ($RS$) and runs allowed ($RA$) to arise from independent Weibull distributions with distinct shapes $\gamma_{RS}$ and $\gamma_{RA}$ while fixing the shift $\beta=-\tfrac{1}{2}$. Parameters $(\alpha_{RS},\gamma_{RS},\alpha_{RA},\gamma_{RA})$ are estimated via the Method of Moments from the first two moments of observed per-game runs, after which the win probability $P(X>Y)$ is computed numerically as a two-dimensional integral. This Differently-Shaped Weibull (DSW) model yields improved predictive accuracy over the traditional Pythagorean predictor with $\gamma\approx1.83$ across 30 MLB seasons, at the cost of losing closed-form win probability. The approach also provides a framework for evaluating player value and suggests extensions to other sports, including potential incorporation of higher moments and sector-specific exponents to capture era- and league-specific run profiles.

Abstract

Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by ${\rm RS}$ and ${\rm RA}$, there is some $γ\approx 2$ such that the winning percentage is approximately ${\rm RS}^γ/ ({\rm RS}^γ+ {\rm RA}^γ)$. One use case is to determine the value of potential signings to the team, as it allows us to estimate how many more wins one obtains over a season given an estimated change in run production and concession. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who assumed the home and away teams' runs arise from independent Weibull distributions with the same shape parameter $γ$; this has been observed to describe the observed run data well and yields a win probability equivalent to that of James' formula). We extend this work to model runs scored and allowed as being drawn from independent Weibull distributions with different shape parameters, and then consider the first and second moments to solve a system of four equations in the four unknowns. Doing so fits the training data better, yielding a higher winning percentage over the last 30 MLB seasons (1994 to 2023). This comes at a small cost as we no longer have a closed form expression for the win probability, but must evaluate a two-dimensional integral of Weibull distributions and numerically estimate the solutions to the system of equations. These are trivial to do with simple computational programs.

Applications of Improvements to the Pythagorean Won-Loss Expectation in Optimizing Rosters

TL;DR

The paper extends the classic Pythagorean Won-Lost framework by allowing runs scored () and runs allowed () to arise from independent Weibull distributions with distinct shapes and while fixing the shift . Parameters are estimated via the Method of Moments from the first two moments of observed per-game runs, after which the win probability is computed numerically as a two-dimensional integral. This Differently-Shaped Weibull (DSW) model yields improved predictive accuracy over the traditional Pythagorean predictor with across 30 MLB seasons, at the cost of losing closed-form win probability. The approach also provides a framework for evaluating player value and suggests extensions to other sports, including potential incorporation of higher moments and sector-specific exponents to capture era- and league-specific run profiles.

Abstract

Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by and , there is some such that the winning percentage is approximately . One use case is to determine the value of potential signings to the team, as it allows us to estimate how many more wins one obtains over a season given an estimated change in run production and concession. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who assumed the home and away teams' runs arise from independent Weibull distributions with the same shape parameter ; this has been observed to describe the observed run data well and yields a win probability equivalent to that of James' formula). We extend this work to model runs scored and allowed as being drawn from independent Weibull distributions with different shape parameters, and then consider the first and second moments to solve a system of four equations in the four unknowns. Doing so fits the training data better, yielding a higher winning percentage over the last 30 MLB seasons (1994 to 2023). This comes at a small cost as we no longer have a closed form expression for the win probability, but must evaluate a two-dimensional integral of Weibull distributions and numerically estimate the solutions to the system of equations. These are trivial to do with simple computational programs.
Paper Structure (10 sections, 24 equations, 6 figures, 2 tables)

This paper contains 10 sections, 24 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The changing probabilities of a family of Weibulls with $\alpha = 1$, $\beta = 0$, and $\gamma \in \{1, 1.25, 1.5, 1.75, 2\}$; $\gamma = 1$ corresponds to the exponential distribution, and increasing $\gamma$ results in the bump moving rightward.
  • Figure 2: Scatter plot with boxplot representation for each season of the last 30 years (excl. 1994, 1995, 2020) of the Mean Squared Error in Predicted vs. Observed wins yielded by the four different methods: Moments, Pythag(1.83), "New" Least Squares ($\gamma_{\rm RS},\gamma_{\rm RA}$ free), and "Ol" Least Squares ($\gamma_{\rm RS}=\gamma_{\rm RA}$). Tied games were included in the data, and counted as 0.5 observed wins for both teams.
  • Figure 3: For the 2022 Washington Nationals, comparison of the Weibulls produced by the Method of Moments against the observed distribution of runs scored (left) and runs allowed (right) per game.
  • Figure 4: For the 2022 Washington Nationals, comparison of the Weibulls produced by the Method of Least Squares against the observed distribution of runs scored (left) and runs allowed (right) per game.
  • Figure 5: The predicted number of additional wins under Pythag(1.83) when: (left) scoring 10 more per season; (right) preventing 10 more per season.
  • ...and 1 more figures