Sequential model confidence sets
Sebastian Arnold, Georgios Gavrilopoulos, Benedikt Schulz, Johanna Ziegel
TL;DR
This paper introduces sequential model confidence sets (SMCS) to monitor forecast superiority in a streaming-data setting, providing time-uniform, nonasymptotic coverage guarantees via e-processes and confidence sequences. It distinguishes three notions of superiority—strongly superior, uniformly weakly superior, and weakly superior—then develops SMCS constructions for each using the closure principle and appropriate e-processes or confidence regions. The authors demonstrate via simulations and two case studies (Covid-19 death forecasts and wind gust postprocessing) that SMCS offer safe, anytime-valid inference and actionable insights for sequential forecast evaluation, including dynamic model narrowing and optional stopping. They discuss extensions (e.g., marginal coverage, FDR control), transformation tricks for boundedness, and future directions such as applying SMCS to information criteria and sequential model selection.
Abstract
In most prediction and estimation situations, scientists consider various statistical models for the same problem, and naturally want to select amongst the best. Hansen et al. (2011) provide a powerful solution to this problem by the so-called model confidence set, a subset of the original set of available models that contains the best models with a given level of confidence. Importantly, model confidence sets respect the underlying selection uncertainty by being flexible in size. However, they presuppose a fixed sample size which stands in contrast to the fact that model selection and forecast evaluation are inherently sequential tasks where we successively collect new data and where the decision to continue or conclude a study may depend on the previous outcomes. In this article, we extend model confidence sets sequentially over time by relying on sequential testing methods. Recently, e-processes and confidence sequences have been introduced as new, safe methods for assessing statistical evidence. Sequential model confidence sets allow to continuously monitor the models' performances and come with time-uniform, nonasymptotic coverage guarantees.
