Locally tail-scale invariant scoring rules for evaluation of extreme value forecasts

Helga Kristin Olafsdottir; Holger Rootzén; David Bolin

Locally tail-scale invariant scoring rules for evaluation of extreme value forecasts

Helga Kristin Olafsdottir, Holger Rootzén, David Bolin

TL;DR

The concept of local weight- Scale invariance is proposed, describing scoring rules fulfilling local scale invariance in a certain region of interest, and as a special case local tail-scale invariance, for large events.

Abstract

Statistical analysis of extremes can be used to predict the probability of future extreme events, such as large rainfalls or devastating windstorms. The quality of these forecasts can be measured through scoring rules. Locally scale invariant scoring rules give equal importance to the forecasts at different locations regardless of differences in the prediction uncertainty. This is a useful feature when computing average scores but can be an unnecessarily strict requirement when mostly concerned with extremes. We propose the concept of local weight-scale invariance, describing scoring rules fulfilling local scale invariance in a certain region of interest, and as a special case local tail-scale invariance, for large events. Moreover, a new version of the weighted Continuous Ranked Probability score (wCRPS) called the scaled wCRPS (swCRPS) that possesses this property is developed and studied. The score is a suitable alternative for scoring extreme value models over areas with varying scale of extreme events, and we derive explicit formulas of the score for the Generalised Extreme Value distribution. The scoring rules are compared through simulation, and their usage is illustrated in modelling of extreme water levels, annual maximum rainfalls, and in an application to non-extreme forecast for the prediction of air pollution.

Locally tail-scale invariant scoring rules for evaluation of extreme value forecasts

TL;DR

Abstract

Paper Structure (27 sections, 6 theorems, 64 equations, 10 figures, 5 tables)

This paper contains 27 sections, 6 theorems, 64 equations, 10 figures, 5 tables.

Introduction
Background
The CRPS and weighted scoring rules for extremes
Local scale invariance and kernel scores
Scaled weighted scoring rules
Local weight-scale invariance
Scaled weighted CRPS
Local weight-scale invariance of the censored likelihood score
Simulation studies
Benchmark example
Score dependence on scale and threshold
Scaling effect on expected scores
Case studies
Extreme water levels
Simulation
...and 12 more sections

Key Result

Proposition 3.5

The wCRPS as defined in Eq. eq:wCRPS is a kernel score.

Figures (10)

Figure 1: Benchmark model simulation showing power of a two-sided pairwise t-test when scoring 1000 independently simulated stationary time series of length 1000 from the ideal model $Y|Z\sim \text{Exp}(Z)$, $Z\sim\text{Gamma}(\xi^{-1},\xi^{-1})$ using the extremist model with parameter $\nu\in(1.001,2)$ and shape parameters $\xi=0.25$ (left) and $\xi=0.5$ (right).
Figure 2: Simulated mean (top) and standard deviation (bottom) of the score difference ${S(\mathbb{Q}_{\bm{\theta}},\mathbb{Q}_{\bm{\theta}})-S(\mathbb{P},\mathbb{Q}_{\bm{\theta}})}$ using wCRPS (left) and swCRPS (right) for two predictions of a random variable $X\sim \mathbb{Q}_{\bm{\theta}}=GEV(\mu=0,\sigma,\gamma=0.12)$, with $\mathbb{P}=GEV(\mu=0,2\sigma,\gamma=0.12)$, as functions of the scale parameter $\sigma$ for different thresholds $q(p)$, chosen as the $p$-th quantile from $\mathbb{Q}$. Threshold $q(-\infty)$ results in the unweighted scores, CRPS and SCRPS.
Figure 3: Simulated expected score $S(\mathbb{P},\mathbb{Q})$ using wCRPS and swCRPS for a pair $(X_1,X_2)$ of random variables with $X_i\sim GEV(\mu_i,\sigma_i,\gamma)$, as functions of $k_i$, $i=1,2$ using a prediction that has the correct location parameters, $\mu_i$, and scale parameters $\widehat{\sigma_i}=k_i\sigma_i$, $i=1,2$. For the true model, $\mu_1=\mu_2=0$, $\gamma=0.12$, $\sigma_1 = 1.5$ and $\sigma_2 = 3$. The weight function was chosen as the 0.90 quantile for each score.
Figure 4: Mean and standard deviations of score differences, $\Delta_i$, at stations $i\in\{1,2,3,4,5\}$ with $k=1.5$, using different types of CRPS scores on simulated data using the estimated parameters from the stations listed in Table \ref{['tab:great:lakes:gev:fit']}
Figure 5: Proportion of times model A was preferred over model B when simulating time series of length 100 when using wCRPS and swCRPS, with individual shape parameter (left) and joint shape parameter (right). The score threshold is chosen as the $p\%$ quantile.
...and 5 more figures

Theorems & Definitions (17)

Definition 2.1
Definition 3.1
Remark 3.2
Remark 3.3
Definition 3.4
Proposition 3.5
Proposition 3.7
Proposition 3.8
Proposition 3.9
Proposition 3.10
...and 7 more

Locally tail-scale invariant scoring rules for evaluation of extreme value forecasts

TL;DR

Abstract

Locally tail-scale invariant scoring rules for evaluation of extreme value forecasts

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (17)