Model selection confidence sets for time series models with applications to electricity load data
Piersilvio De Bortoli, Davide Ferrari, Francesco Ravazzolo, Luca Rossini
TL;DR
This paper tackles model selection uncertainty in univariate time series by introducing the Model Selection Confidence Set (MSCS) for ARMAX models, enabling a set of statistically indistinguishable specifications at a chosen confidence level $(1-\alpha)$. The authors formulate the MSCS using likelihood ratio tests against a full model, prove asymptotic coverage, and propose Lower Boundary Models (LBMs) plus an union model to balance parsimony and robustness. Through Monte Carlo simulations, they characterize finite-sample behavior, including how MSCS size and LBM composition respond to sample size and model complexity. They then apply MSCS to Italian hourly electricity load data, revealing time-varying selection uncertainty and identifying core drivers such as intraday lags, temperature, calendar effects, and solar generation, while showing that MSCS-based forecasts are robust and competitive. Overall, the work provides a formal, data-driven approach to quantify and exploit model uncertainty for improved forecasting and interpretation in energy economics and time-series analysis.
Abstract
This paper studies the Model Selection Confidence Set (MSCS) methodology for univariate time series models involving autoregressive and moving average components, and applies it to study model selection uncertainty in the Italian electricity load data. Rather than relying on a single model selected by an arbitrary criterion, the MSCS identifies a set of models that are statistically indistinguishable from the true data-generating process at a given confidence level. The size and composition of this set reveal crucial information about model selection uncertainty: noisy data scenarios produce larger sets with many candidate models, while more informative cases narrow the set considerably. To study the importance of each model term, we consider numerical statistics measuring the frequency with which each term is included in both the entire MSCS and in Lower Boundary Models (LBM), its most parsimonious specifications. Applied to Italian hourly electricity load data, the MSCS methodology reveals marked intraday variation in model selection uncertainty and isolates a collection of model specifications that deliver competitive short-term forecasts while highlighting key drivers of electricity load like intraday hourly lags, temperature, calendar effects and solar energy generation.
