Regularized Ensemble Forecasting for Learning Weights from Historical and Current Forecasts
Han Su, Xiaojia Guo, Xiaoke Zhang
TL;DR
This paper introduces Regularized Ensemble Forecasting (REF), a flexible framework that optimally combines current expert forecasts with historical performance through a regularized variance-minimization objective. By allowing a monotone transformation and penalty function, REF can emulate Bayesian posterior updates under various priors and distributions, providing a principled link between frequentist ensemble weights and Bayesian inference. The method admits practical implementations via softmax parameterization and rolling-window tuning of the regularization strength, with nuisance parameters estimated from data. Empirically, REF outperforms standard current-only and history-only baselines in the M5 Walmart dataset and the SPF macroeconomic forecasts, while offering insights into when current forecasts or historical performance drive improvements. The work highlights the adaptability of REF to changing expert pools and horizons and points to future extensions to probabilistic forecasts and non-continuous outcomes.
Abstract
Combining forecasts from multiple experts often yields more accurate results than relying on a single expert. In this paper, we introduce a novel regularized ensemble method that extends the traditional linear opinion pool by leveraging both current forecasts and historical performances to set the weights. Unlike existing approaches that rely only on either the current forecasts or past accuracy, our method accounts for both sources simultaneously. It learns weights by minimizing the variance of the combined forecast (or its transformed version) while incorporating a regularization term informed by historical performances. We also show that this approach has a Bayesian interpretation. Different distributional assumptions within this Bayesian framework yield different functional forms for the variance component and the regularization term, adapting the method to various scenarios. In empirical studies on Walmart sales and macroeconomic forecasting, our ensemble outperforms leading benchmark models both when experts' full forecasting histories are available and when experts enter and exit over time, resulting in incomplete historical records. Throughout, we provide illustrative examples that show how the optimal weights are determined and, based on the empirical results, we discuss where the framework's strengths lie and when experts' past versus current forecasts are more informative.
