Table of Contents
Fetching ...

On the evaluation of time-to-event, survival time and first passage time forecasts

Robert J. Taggart, Nicholas Loveday, Simon Louis

Abstract

Time-to-event forecasts are essential when decisions depend on event timing. This article develops a framework for evaluating such forecasts when the event has not yet occurred or is not predicted within the forecast horizon. We introduce a theory of provisional evaluation, in which each forecast is assessed against its right-censored realization, defined as the minimum of the event time and the evaluation time. For probabilistic forecasts, we show that strictly proper scoring rules induce provisionally strictly proper scoring rules, whose expected score, computed from the right-censored realization, is optimized under truthful forecasting. Threshold-weighted versions of the continuous ranked probability score and the logarithmic score satisfy this property. We also develop a theory for scoring point (single-valued) forecasts under right-censoring. Quantile and interquartile range forecasts are shown to be provisionally elicitable, meaning that scoring functions exist for which these functionals uniquely minimize the expected score, whereas the expectation functional is not provisionally elicitable. A synthetic experiment demonstrates that the proposed scores correctly rank forecasters. Diagnostic tools, including Murphy diagrams and reliability diagrams, extend naturally. Applications to operational time-to-flood and time-to-strong-wind forecasts illustrate the approach.

On the evaluation of time-to-event, survival time and first passage time forecasts

Abstract

Time-to-event forecasts are essential when decisions depend on event timing. This article develops a framework for evaluating such forecasts when the event has not yet occurred or is not predicted within the forecast horizon. We introduce a theory of provisional evaluation, in which each forecast is assessed against its right-censored realization, defined as the minimum of the event time and the evaluation time. For probabilistic forecasts, we show that strictly proper scoring rules induce provisionally strictly proper scoring rules, whose expected score, computed from the right-censored realization, is optimized under truthful forecasting. Threshold-weighted versions of the continuous ranked probability score and the logarithmic score satisfy this property. We also develop a theory for scoring point (single-valued) forecasts under right-censoring. Quantile and interquartile range forecasts are shown to be provisionally elicitable, meaning that scoring functions exist for which these functionals uniquely minimize the expected score, whereas the expectation functional is not provisionally elicitable. A synthetic experiment demonstrates that the proposed scores correctly rank forecasters. Diagnostic tools, including Murphy diagrams and reliability diagrams, extend naturally. Applications to operational time-to-flood and time-to-strong-wind forecasts illustrate the approach.
Paper Structure (15 sections, 12 theorems, 51 equations, 5 figures, 6 tables)

This paper contains 15 sections, 12 theorems, 51 equations, 5 figures, 6 tables.

Key Result

Theorem 1

Suppose that $\mathcal{F}$ and $\mathcal{F}_*$ are two classes of distributions on $I$ such that $[F]_\tau\in\mathcal{F}_*$ whenever $F\in\mathcal{F}$. If $S$ is a strictly proper scoring rule relative to $\mathcal{F}_*$ then the scoring rule $S_\tau$, defined by $S_\tau(F,t)=S([F]_\tau,t)$, is prov

Figures (5)

  • Figure 1: Example of time-to-event predictive PDFs and survival functions of the five forecasters in the synthetic experiment, along with the corresponding realization. The top row illustrates the distributions of the forecasters who make ideal forecasts, but with access to different information. The bottom row shows the distributions of the forecasters who have access to the same information, but where two of the forecasters' distributions are misspecified.
  • Figure 2: (a) Brier score decomposition of CRPS for predictive distributions and (b) elementary score decomposition of quantile loss for 0.9-quantile forecasts, against decision threshold, for the forecasters from the synthetic experiment. For each decision threshold, a lower mean score is preferred.
  • Figure 3: River height forecasts (gray) for North Richmond from (a) EPS A and (b) EPS B with observed heights (black lines). Horizontal lines mark thresholds for minor, moderate, and major flooding. Empirical CDFs for first passage time for (c) EPS A and (d) EPS B for the minor, moderate and major flood thresholds. Vertical dotted lines indicate realized first passage times for minor and moderate thresholds; the major threshold was not exceeded. Time 0 corresponds to model initialization at 00:00 UTC on 6 June 2024.
  • Figure 4: Wind speed forecasts for Botany Bay from the ACCESS model (initialization 3 January 2023 00:00 UTC) and corresponding observations.
  • Figure 5: Reliability curves from isotonic regression using Botany Bay data: (a) 2023 time-to-event forecasts derived from bias-corrected wind speed forecasts, (b) 2024 uncalibrated and calibrated time-to-event quantile forecasts.

Theorems & Definitions (30)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Corollary 2
  • Definition 5
  • Definition 6
  • ...and 20 more