Hierarchical Forecasting at Scale
Olivier Sprangers, Wander Wadman, Sebastian Schelter, Maarten de Rijke
TL;DR
The paper tackles the scalability challenge of hierarchical forecasting when millions of time series are involved. It introduces a sparse hierarchical loss (HL) that directly enforces cross-sectional and temporal coherency within a single bottom-level forecast model, removing the need for costly post-hoc reconciliation. The approach achieves quadratic scaling in the hierarchy and demonstrates substantial performance and efficiency gains on both public (M5) and production (bol) datasets, outperforming reconciliation-based methods and improving product-level forecasts. Practically, HL enables end-to-end, coherently aggregated forecasts at scale, reducing deployment complexity and prediction-time cost, with future work aimed at probabilistic extensions and robustness to hierarchy misspecification.
Abstract
Existing hierarchical forecasting techniques scale poorly when the number of time series increases. We propose to learn a coherent forecast for millions of time series with a single bottom-level forecast model by using a sparse loss function that directly optimizes the hierarchical product and/or temporal structure. The benefit of our sparse hierarchical loss function is that it provides practitioners a method of producing bottom-level forecasts that are coherent to any chosen cross-sectional or temporal hierarchy. In addition, removing the need for a post-processing step as required in traditional hierarchical forecasting techniques reduces the computational cost of the prediction phase in the forecasting pipeline. On the public M5 dataset, our sparse hierarchical loss function performs up to 10% (RMSE) better compared to the baseline loss function. We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, we found an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies that we defined. These results demonstrate the usefulness of our sparse hierarchical loss applied to a production forecasting system at a major e-commerce platform.
