Hierarchical Forecasting at Scale

Olivier Sprangers; Wander Wadman; Sebastian Schelter; Maarten de Rijke

Hierarchical Forecasting at Scale

Olivier Sprangers, Wander Wadman, Sebastian Schelter, Maarten de Rijke

TL;DR

The paper tackles the scalability challenge of hierarchical forecasting when millions of time series are involved. It introduces a sparse hierarchical loss (HL) that directly enforces cross-sectional and temporal coherency within a single bottom-level forecast model, removing the need for costly post-hoc reconciliation. The approach achieves quadratic scaling in the hierarchy and demonstrates substantial performance and efficiency gains on both public (M5) and production (bol) datasets, outperforming reconciliation-based methods and improving product-level forecasts. Practically, HL enables end-to-end, coherently aggregated forecasts at scale, reducing deployment complexity and prediction-time cost, with future work aimed at probabilistic extensions and robustness to hierarchy misspecification.

Abstract

Existing hierarchical forecasting techniques scale poorly when the number of time series increases. We propose to learn a coherent forecast for millions of time series with a single bottom-level forecast model by using a sparse loss function that directly optimizes the hierarchical product and/or temporal structure. The benefit of our sparse hierarchical loss function is that it provides practitioners a method of producing bottom-level forecasts that are coherent to any chosen cross-sectional or temporal hierarchy. In addition, removing the need for a post-processing step as required in traditional hierarchical forecasting techniques reduces the computational cost of the prediction phase in the forecasting pipeline. On the public M5 dataset, our sparse hierarchical loss function performs up to 10% (RMSE) better compared to the baseline loss function. We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, we found an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies that we defined. These results demonstrate the usefulness of our sparse hierarchical loss applied to a production forecasting system at a major e-commerce platform.

Hierarchical Forecasting at Scale

TL;DR

Abstract

Paper Structure (40 sections, 22 equations, 2 figures, 9 tables)

This paper contains 40 sections, 22 equations, 2 figures, 9 tables.

Introduction
Challenges with existing cross-sectional and temporal hierarchical forecasting techniques
Sparse loss function
Evaluation
Contributions
Related work
Forecasting for large-scale settings
Hierarchical forecasting
Background
Problem definition
Reconciliation methods
Other methods
Scaling issues of hierarchical forecasting methods
Scaling issues with reconciliation methods
Scaling issues with other methods
...and 25 more sections

Figures (2)

Figure 1: Forecasting results for the primary product forecasting model at our e-commerce partner bol. We show RMSE (a, left) and MAE (b, right) by weekly demand bucket relative to the Tweedie loss baseline for each forecasting horizon (week). The Hierarchical loss outperforms the Tweedie loss on RMSE and MAE on smaller weekly demand buckets.
Figure 2: Forecasting results for the primary product forecasting model at our e-commerce partner bol. We show RMSE (left column of figures) and MAE (right column of figures) by aggregation level relative to the Tweedie loss baseline for each forecasting horizon (week). The Hierarchical loss commonly outperforms the Tweedie loss on every aggregation level.

Hierarchical Forecasting at Scale

TL;DR

Abstract

Hierarchical Forecasting at Scale

Authors

TL;DR

Abstract

Table of Contents

Figures (2)