Table of Contents
Fetching ...

Sequential model confidence sets

Sebastian Arnold, Georgios Gavrilopoulos, Benedikt Schulz, Johanna Ziegel

TL;DR

This paper introduces sequential model confidence sets (SMCS) to monitor forecast superiority in a streaming-data setting, providing time-uniform, nonasymptotic coverage guarantees via e-processes and confidence sequences. It distinguishes three notions of superiority—strongly superior, uniformly weakly superior, and weakly superior—then develops SMCS constructions for each using the closure principle and appropriate e-processes or confidence regions. The authors demonstrate via simulations and two case studies (Covid-19 death forecasts and wind gust postprocessing) that SMCS offer safe, anytime-valid inference and actionable insights for sequential forecast evaluation, including dynamic model narrowing and optional stopping. They discuss extensions (e.g., marginal coverage, FDR control), transformation tricks for boundedness, and future directions such as applying SMCS to information criteria and sequential model selection.

Abstract

In most prediction and estimation situations, scientists consider various statistical models for the same problem, and naturally want to select amongst the best. Hansen et al. (2011) provide a powerful solution to this problem by the so-called model confidence set, a subset of the original set of available models that contains the best models with a given level of confidence. Importantly, model confidence sets respect the underlying selection uncertainty by being flexible in size. However, they presuppose a fixed sample size which stands in contrast to the fact that model selection and forecast evaluation are inherently sequential tasks where we successively collect new data and where the decision to continue or conclude a study may depend on the previous outcomes. In this article, we extend model confidence sets sequentially over time by relying on sequential testing methods. Recently, e-processes and confidence sequences have been introduced as new, safe methods for assessing statistical evidence. Sequential model confidence sets allow to continuously monitor the models' performances and come with time-uniform, nonasymptotic coverage guarantees.

Sequential model confidence sets

TL;DR

This paper introduces sequential model confidence sets (SMCS) to monitor forecast superiority in a streaming-data setting, providing time-uniform, nonasymptotic coverage guarantees via e-processes and confidence sequences. It distinguishes three notions of superiority—strongly superior, uniformly weakly superior, and weakly superior—then develops SMCS constructions for each using the closure principle and appropriate e-processes or confidence regions. The authors demonstrate via simulations and two case studies (Covid-19 death forecasts and wind gust postprocessing) that SMCS offer safe, anytime-valid inference and actionable insights for sequential forecast evaluation, including dynamic model narrowing and optional stopping. They discuss extensions (e.g., marginal coverage, FDR control), transformation tricks for boundedness, and future directions such as applying SMCS to information criteria and sequential model selection.

Abstract

In most prediction and estimation situations, scientists consider various statistical models for the same problem, and naturally want to select amongst the best. Hansen et al. (2011) provide a powerful solution to this problem by the so-called model confidence set, a subset of the original set of available models that contains the best models with a given level of confidence. Importantly, model confidence sets respect the underlying selection uncertainty by being flexible in size. However, they presuppose a fixed sample size which stands in contrast to the fact that model selection and forecast evaluation are inherently sequential tasks where we successively collect new data and where the decision to continue or conclude a study may depend on the previous outcomes. In this article, we extend model confidence sets sequentially over time by relying on sequential testing methods. Recently, e-processes and confidence sequences have been introduced as new, safe methods for assessing statistical evidence. Sequential model confidence sets allow to continuously monitor the models' performances and come with time-uniform, nonasymptotic coverage guarantees.
Paper Structure (32 sections, 11 theorems, 45 equations, 12 figures, 2 algorithms)

This paper contains 32 sections, 11 theorems, 45 equations, 12 figures, 2 algorithms.

Key Result

Theorem 3.1

For any $\alpha \in (0,1)$, the sequence $(\widehat{\mathcal{M}}_t)_{t \in \mathbb{N}}$ defined at eq:def_SMCS_strong_hypthesis_e_process is an SMCS at level $\alpha$ for $\mathcal{M}^{\bullet, \star}$, $\bullet \in \{\mathrm{s},\mathrm{uw}\}$, and so is its running intersection $\widetilde{\mathcal

Figures (12)

  • Figure 1: The average number of models in the SMCS in Simulation 1 and 2. At the end of the evaluation period, the SMCSs have an average size of $8.41$ and $9.95$, respectively. For both simulations, the SMCS never wrongly excludes the best model $i_0$.
  • Figure 2: Left: Realized accumulated losses $\sum_{r=1}^t L_{i,r}$ for one realization: Worsening forecaster (green), improving forecaster (blue), constantly biased forecaster (red). The black vertical lines indicate $t=154$ and $t=550$. The resulting SMCS is given in the upper part with the respective colors. Right: Average size of the SMCS over $N=100$ realizations.
  • Figure 3: Time progression of predicted and actual Covid related mortality in linear (upper left) and log scale (upper right). For the forecasts of the tails, the differences between the different models are more pronounced than for the median, see the lower panels for $\tau=0.15,0.975$.
  • Figure 4: SMCSs at four selected quantile levels with confidence level $\alpha=0.1$.
  • Figure 5: SMCS of averaged loss differences over all stations (left) and the number of stations where a method is included in the SMCS dependent on time (right). The vertical black lines indicate the three major NWP model updates.
  • ...and 7 more figures

Theorems & Definitions (25)

  • Theorem 3.1
  • Remark 1
  • Remark 2
  • Proposition 3.2
  • proof
  • Proposition 3.3
  • Theorem 3.4
  • Proposition 3.5
  • Proposition 3.6
  • Remark 3
  • ...and 15 more