Table of Contents
Fetching ...

Trustworthy predictive distributions for rare events via diagnostic transport maps

Elizabeth Cucuzzella, Rafael Izbicki, Ann B. Lee

Abstract

Forecast systems in science and technology are increasingly moving beyond point prediction toward methods that produce full predictive distributions of future outcomes y, conditional on high-dimensional and complex sequences of inputs x. However, even when forecast systems provide a full predictive distribution, the result is rarely calibrated with respect to all x and y. The estimated density can be especially unreliable in low-frequency or out-of-distribution regimes, where accurate uncertainty quantification and a means for human experts to verify results are most needed to establish trust in models. In this paper, we take an initial predictive distribution as given and treat it as a useful but potentially misspecified base model. WE then introduce diagnostic transport maps, covariate-dependent probability-to-probability maps that quantify how the base model's probabilities should be adjusted to better match the true conditional distribution of calibration data. At deployment, these maps provide the user with real-time local diagnostics that reveal where the model fails and how it fails (including bias, dispersion, skewness, and tail errors), while also producing a recalibrated predictive distribution through a simple composition with the base model. We apply diagnostic transport maps to short-term tropical cyclone intensity forecasting and show that an easy-to-fit parametric version identifies evolutionary modes associated with local miscalibration and improves the predictive performance for rare events, including 24-hour rapid intensity change, as compared to the operational forecasts of the National Hurricane Center.

Trustworthy predictive distributions for rare events via diagnostic transport maps

Abstract

Forecast systems in science and technology are increasingly moving beyond point prediction toward methods that produce full predictive distributions of future outcomes y, conditional on high-dimensional and complex sequences of inputs x. However, even when forecast systems provide a full predictive distribution, the result is rarely calibrated with respect to all x and y. The estimated density can be especially unreliable in low-frequency or out-of-distribution regimes, where accurate uncertainty quantification and a means for human experts to verify results are most needed to establish trust in models. In this paper, we take an initial predictive distribution as given and treat it as a useful but potentially misspecified base model. WE then introduce diagnostic transport maps, covariate-dependent probability-to-probability maps that quantify how the base model's probabilities should be adjusted to better match the true conditional distribution of calibration data. At deployment, these maps provide the user with real-time local diagnostics that reveal where the model fails and how it fails (including bias, dispersion, skewness, and tail errors), while also producing a recalibrated predictive distribution through a simple composition with the base model. We apply diagnostic transport maps to short-term tropical cyclone intensity forecasting and show that an easy-to-fit parametric version identifies evolutionary modes associated with local miscalibration and improves the predictive performance for rare events, including 24-hour rapid intensity change, as compared to the operational forecasts of the National Hurricane Center.
Paper Structure (52 sections, 11 theorems, 81 equations, 8 figures, 2 tables)

This paper contains 52 sections, 11 theorems, 81 equations, 8 figures, 2 tables.

Key Result

Lemma 3.1

Fix $x \in \mathcal{X}$ and assume that the initial predictive CDF $\widehat{F}(\cdot \mid x)$ is strictly increasing and continuous in $y$. For every $y \in \mathcal{Y}$,

Figures (8)

  • Figure 1: Synthetic example illustrating diagnostic transport maps. In this example, the true (unknown) predictive distribution $F(y|x)$ follows a sinh-arcsinh distribution with a shape that varies with $x$; see Section \ref{['sec:synthetic']} for details. From calibration data (pairs of $(x,y)$-variables), we estimate a diagnostic map that transports a Gaussian base model to a reshaped distribution that better approximates the predictive distribution for all $x$ and $y$. The central image in the figure depicts the first two principal components (PCs) of the input space with the color of each test point encoding the estimated "local discrepancy score" (Equation \ref{['eq:LDS']}); this score provides the scientist with guidance on how far the original base model is from the true predictive distribution for different $x$-values. The surrounding insets show some examples of how the scientist can then "zoom in" on specific locations to view the estimated PIT-CDF diagnostics (here nonparametric diagnostic transport map) that provide detailed information on the failure; each matching plot to the right shows the base and reshaped PDFs "before" and "after" applying the transport map. The PIT-CDFs of the outer bands of the "data manifold" in PC space signal a positive bias of the base model, whereas the the PIT-CDFs of the inner band signal a negative bias. Moving clockwise along the manifold, we see PIT-CDF diagnostics that signal over- to under-dispersion due to changing skewness and tail-weight of the true predictive CDF.
  • Figure 2: Diagnostic maps as the solution to an optimal transport problem.Top (A): A standard optimal transport (OT) map from $\widehat{F}(y\mid x)$ to $F(y\mid x)$ uses quantile matching for fixed $x$ to match calibration data, where the OT map is given by $T_x(y_0) := F^{-1}(\widehat{F}(y_0\mid x)\mid x).$ The OT map rearranges one PDF into another in outcome space, as indicated by the left-right double arrows. In practice, the true target distribution $F$ is not known, and an OT map is also not designed to provide diagnostics of a base model. Bottom (B): A diagnostic transport constructs an estimate of the entire conditional distribution $F(y\mid x)$ by mapping probabilities, yielding the recalibrated distribution $\widetilde{F}(y_0\mid x) = \widehat{G}_{x}(\widehat{F}(y\mid x))$ (Definition \ref{['def:recalibrated_PD']}). We achieve an estimate of the entire family of OT maps via $\widehat{T}(y_0\mid x) := \widetilde{F}^{-1}(\widehat{F}(y_0\mid x)\mid x)$ for all $x\in \mathcal{X}$ and $y_0 \in \mathcal{Y}$ (Proposition \ref{['prop:Tx-G']}) via a single regression. That is, the diagnostic map reshapes the original PDF in probability space, as indicated by the top-bottom double arrows, but the end result of matching quantiles or mapping probabilities is the same at every $x$.
  • Figure 3: Convergence rates of parametric versus nonparametric diagnostic transport maps. The left and right panels show the integrated squared error (ISE; Equation \ref{['eq:ISE']}) as a function of the calibration size $N$ for test points (or "Examples") D and E in Figure \ref{['fig:SASDistribution']}. The curves and shaded areas represent the average ISE plus/minus two standard deviations for $B=25$ simulated calibration sets. Although parametric transport maps have non-vanishing model bias, their ISE decays much faster than nonparametric transport maps, effectively leading to more accurate estimates of the predictive distributions in the challenging small-$N$ regime, which is relevant for rare events.
  • Figure 4: Diagnostic insights for TC application.Left: Each point in the graph represents a particular 12-hour input sequence $\mathbf{S}_{\leq t}=\{{\mathbf{x}}_{t-12}, {\mathbf{x}}_{t-6}, {\mathbf{x}}_t\}$, where each ${\mathbf{x}}$ corresponds to SHIPS predictors and TC intensity at a particular instance of the storm's evolution. With a parametric diagnostic transport map, we can quickly compute a local discrepancy score (LDS) that assesses how well the initial error distribution of $t+24h$ intensities matches the test data for that particular input sequence. Point A represents a storm sequence with low LDS (good fit). Point B represents a storm sequence that has a high LDS (poor fit). For simplicity, we are only displaying two SHIPS predictors here: the surface wind at times t-12 and t-6 hours. Center: Diagnostic plots for test points A and B. Note that Panel B has a shape that indicates that the base model of +24-hour errors is positively biased; that is, the predictive distribution of TC intensities at t+24 hours is shifted upwards relative to the true TC intensity distribution. Right: Recalibrated PDFs (orange) of +24-hour intensity errors after applying the estimated transport maps to the base models.
  • Figure 5: Example dashboard for TC scientists and forecasters. Diagnostic transport maps provide the user with local diagnostics and a mechanism for reshaping the base model in real time; no additional training is needed at deployment. Hence, a human expert can directly connect model output to physical processes and check whether the correction makes sense. Example B in Figure \ref{['fig:LDSAndPP']} corresponds to a specific evolutionary mode of Hurricane Irma (2017). This figure shows an example dashboard that can facilitate model verification; see https://drive.google.com/file/d/1gFO8NTo96nZQcEZWoAc5cwo4cqyUvTRh/view?usp=sharing for a video. Top left: Track of Hurricane Irma (2017) with the intensity of the storm encoded by NHC's color classification scheme from past track maps pastracks at the synoptic times at 6-hour resolution. "Current location" corresponds to time $t$. Top right: Six SHIPS predictors shown from $t-24$ to $t$. The red dots mark the values that are included in the 12-hour input sequence $\mathbf{S}_{\leq t}$. Bottom left: TC intensity as a function of time. The red band marks the time interval between $t-12h$, to $t$. The vertical dashed line marks the time $t+24h$ where we make our forecast. Bottom right: The error distribution $\widehat{f}(\epsilon_{t+24h} \mid \mathbf{S}_{\leq t})$ from NHC official forecasts (black dashed) and the reshaped error distribution $\widetilde{f}(\epsilon_{t+24h} \mid \mathbf{S}_{\leq t})$ after applying parametric transport maps (solid orange). Here the recalibration shifts the mean of the base density toward 0 knots (the true intensity error), while also decreasing the variance resulting in a tighter prediction interval than for the base model.
  • ...and 3 more figures

Theorems & Definitions (25)

  • Lemma 3.1
  • Definition 3.2: Recalibrated predictive distribution
  • Corollary 3.3: Error in probability forecasts
  • Proposition 3.4: OT map as a composition involving $G$
  • Definition 5.1: Best-in-family PIT parameter
  • Proposition 5.2: Error in reshaped PD
  • Definition B.1: CRPS
  • Proposition B.2
  • proof
  • Definition C.1
  • ...and 15 more