Table of Contents
Fetching ...

Locally tail-scale invariant scoring rules for evaluation of extreme value forecasts

Helga Kristin Olafsdottir, Holger Rootzén, David Bolin

TL;DR

The concept of local weight- Scale invariance is proposed, describing scoring rules fulfilling local scale invariance in a certain region of interest, and as a special case local tail-scale invariance, for large events.

Abstract

Statistical analysis of extremes can be used to predict the probability of future extreme events, such as large rainfalls or devastating windstorms. The quality of these forecasts can be measured through scoring rules. Locally scale invariant scoring rules give equal importance to the forecasts at different locations regardless of differences in the prediction uncertainty. This is a useful feature when computing average scores but can be an unnecessarily strict requirement when mostly concerned with extremes. We propose the concept of local weight-scale invariance, describing scoring rules fulfilling local scale invariance in a certain region of interest, and as a special case local tail-scale invariance, for large events. Moreover, a new version of the weighted Continuous Ranked Probability score (wCRPS) called the scaled wCRPS (swCRPS) that possesses this property is developed and studied. The score is a suitable alternative for scoring extreme value models over areas with varying scale of extreme events, and we derive explicit formulas of the score for the Generalised Extreme Value distribution. The scoring rules are compared through simulation, and their usage is illustrated in modelling of extreme water levels, annual maximum rainfalls, and in an application to non-extreme forecast for the prediction of air pollution.

Locally tail-scale invariant scoring rules for evaluation of extreme value forecasts

TL;DR

The concept of local weight- Scale invariance is proposed, describing scoring rules fulfilling local scale invariance in a certain region of interest, and as a special case local tail-scale invariance, for large events.

Abstract

Statistical analysis of extremes can be used to predict the probability of future extreme events, such as large rainfalls or devastating windstorms. The quality of these forecasts can be measured through scoring rules. Locally scale invariant scoring rules give equal importance to the forecasts at different locations regardless of differences in the prediction uncertainty. This is a useful feature when computing average scores but can be an unnecessarily strict requirement when mostly concerned with extremes. We propose the concept of local weight-scale invariance, describing scoring rules fulfilling local scale invariance in a certain region of interest, and as a special case local tail-scale invariance, for large events. Moreover, a new version of the weighted Continuous Ranked Probability score (wCRPS) called the scaled wCRPS (swCRPS) that possesses this property is developed and studied. The score is a suitable alternative for scoring extreme value models over areas with varying scale of extreme events, and we derive explicit formulas of the score for the Generalised Extreme Value distribution. The scoring rules are compared through simulation, and their usage is illustrated in modelling of extreme water levels, annual maximum rainfalls, and in an application to non-extreme forecast for the prediction of air pollution.
Paper Structure (27 sections, 6 theorems, 64 equations, 10 figures, 5 tables)

This paper contains 27 sections, 6 theorems, 64 equations, 10 figures, 5 tables.

Key Result

Proposition 3.5

The wCRPS as defined in Eq. eq:wCRPS is a kernel score.

Figures (10)

  • Figure 1: Benchmark model simulation showing power of a two-sided pairwise t-test when scoring 1000 independently simulated stationary time series of length 1000 from the ideal model $Y|Z\sim \text{Exp}(Z)$, $Z\sim\text{Gamma}(\xi^{-1},\xi^{-1})$ using the extremist model with parameter $\nu\in(1.001,2)$ and shape parameters $\xi=0.25$ (left) and $\xi=0.5$ (right).
  • Figure 2: Simulated mean (top) and standard deviation (bottom) of the score difference ${S(\mathbb{Q}_{\bm{\theta}},\mathbb{Q}_{\bm{\theta}})-S(\mathbb{P},\mathbb{Q}_{\bm{\theta}})}$ using wCRPS (left) and swCRPS (right) for two predictions of a random variable $X\sim \mathbb{Q}_{\bm{\theta}}=GEV(\mu=0,\sigma,\gamma=0.12)$, with $\mathbb{P}=GEV(\mu=0,2\sigma,\gamma=0.12)$, as functions of the scale parameter $\sigma$ for different thresholds $q(p)$, chosen as the $p$-th quantile from $\mathbb{Q}$. Threshold $q(-\infty)$ results in the unweighted scores, CRPS and SCRPS.
  • Figure 3: Simulated expected score $S(\mathbb{P},\mathbb{Q})$ using wCRPS and swCRPS for a pair $(X_1,X_2)$ of random variables with $X_i\sim GEV(\mu_i,\sigma_i,\gamma)$, as functions of $k_i$, $i=1,2$ using a prediction that has the correct location parameters, $\mu_i$, and scale parameters $\widehat{\sigma_i}=k_i\sigma_i$, $i=1,2$. For the true model, $\mu_1=\mu_2=0$, $\gamma=0.12$, $\sigma_1 = 1.5$ and $\sigma_2 = 3$. The weight function was chosen as the 0.90 quantile for each score.
  • Figure 4: Mean and standard deviations of score differences, $\Delta_i$, at stations $i\in\{1,2,3,4,5\}$ with $k=1.5$, using different types of CRPS scores on simulated data using the estimated parameters from the stations listed in Table \ref{['tab:great:lakes:gev:fit']}
  • Figure 5: Proportion of times model A was preferred over model B when simulating time series of length 100 when using wCRPS and swCRPS, with individual shape parameter (left) and joint shape parameter (right). The score threshold is chosen as the $p\%$ quantile.
  • ...and 5 more figures

Theorems & Definitions (17)

  • Definition 2.1
  • Definition 3.1
  • Remark 3.2
  • Remark 3.3
  • Definition 3.4
  • Proposition 3.5
  • Proposition 3.7
  • Proposition 3.8
  • Proposition 3.9
  • Proposition 3.10
  • ...and 7 more