On the relationship between the Wasserstein distance and differences in life expectancy at birth
Markus Sauerberg
TL;DR
The paper formulates age-at-death distributions via the one-dimensional Wasserstein distance $W_1$ and proves that, under non-crossing survivorship, $W_1(d_A,d_B)=e_{0,A}-e_{0,B}$, reframing life expectancy gaps as distributional differences. It derives the theoretical links between $e_0$ and the age-at-death distribution, provides proofs and graphical illustrations, and extends to cause-specific mortality using a two-dimensional OT framework solved with POT. Empirically, analyses of 5,000 country pairs and both period and cohort life tables from the Human Mortality Database show an extremely strong correspondence between $W_1$ and $\Delta e_0$ (Pearson $r\approx0.99$) with some divergence when survivorships cross. The work offers a distributional perspective on longevity differences, suggests complementary use of $W_1$ with health- and life-expectancy measures, and lays groundwork for further two-dimensional extensions and health expectancy applications.
Abstract
The Wasserstein distance is a metric for assessing distributional differences. The measure originates in optimal transport theory and can be interpreted as the minimal cost of transforming one distribution into another. In this paper, the Wasserstein distance is applied to life table age-at-death distributions. The main finding is that, under certain conditions, the Wasserstein distance between two age-at-death distributions equals the corresponding gap in life expectancy at birth ($e_0$). More specifically, the paper shows mathematically and empirically that this equivalence holds whenever the survivorship functions do not cross. For example, this applies when comparing mortality between women and men from 1990 to 2020 using data from the Human Mortality Database. In such cases, the gap in $e_0$ reflects not only a difference in mean ages at death but can also be interpreted directly as a measure of distributional difference.
