Table of Contents
Fetching ...

Multi-layer Stack Ensembles for Time Series Forecasting

Nathanael Bosch, Oleksandr Shchur, Nick Erickson, Michael Bohlke-Schneider, Caner Türkmen

TL;DR

The paper identifies forecast ensembling as a key driver of accuracy in time series forecasting and demonstrates that stacking, especially when deployed in a multi-layer framework, yields robust gains across diverse real-world datasets. It introduces a three-layer stacking architecture (L1 base forecasters, L2 stackers, L3 aggregator) and provides a rigorous training protocol via time-series cross-validation to prevent leakage. Through a large-scale benchmark of 33 ensemble methods on 50 datasets, the study shows that multi-layer stacking consistently outperforms single-layer approaches and simple averages, while remaining robust to changes in base-model selection. The results have practical implications for AutoML systems in forecasting, suggesting that adaptive, diverse ensembling strategies can significantly improve predictive accuracy across point and probabilistic tasks, albeit at higher computation costs. The work also offers detailed ablations and guidance on data usage, model choices, and retraining strategies to maximize performance while acknowledging resource trade-offs.

Abstract

Ensembling is a powerful technique for improving the accuracy of machine learning models, with methods like stacking achieving strong results in tabular tasks. In time series forecasting, however, ensemble methods remain underutilized, with simple linear combinations still considered state-of-the-art. In this paper, we systematically explore ensembling strategies for time series forecasting. We evaluate 33 ensemble models -- both existing and novel -- across 50 real-world datasets. Our results show that stacking consistently improves accuracy, though no single stacker performs best across all tasks. To address this, we propose a multi-layer stacking framework for time series forecasting, an approach that combines the strengths of different stacker models. We demonstrate that this method consistently provides superior accuracy across diverse forecasting scenarios. Our findings highlight the potential of stacking-based methods to improve AutoML systems for time series forecasting.

Multi-layer Stack Ensembles for Time Series Forecasting

TL;DR

The paper identifies forecast ensembling as a key driver of accuracy in time series forecasting and demonstrates that stacking, especially when deployed in a multi-layer framework, yields robust gains across diverse real-world datasets. It introduces a three-layer stacking architecture (L1 base forecasters, L2 stackers, L3 aggregator) and provides a rigorous training protocol via time-series cross-validation to prevent leakage. Through a large-scale benchmark of 33 ensemble methods on 50 datasets, the study shows that multi-layer stacking consistently outperforms single-layer approaches and simple averages, while remaining robust to changes in base-model selection. The results have practical implications for AutoML systems in forecasting, suggesting that adaptive, diverse ensembling strategies can significantly improve predictive accuracy across point and probabilistic tasks, albeit at higher computation costs. The work also offers detailed ablations and guidance on data usage, model choices, and retraining strategies to maximize performance while acknowledging resource trade-offs.

Abstract

Ensembling is a powerful technique for improving the accuracy of machine learning models, with methods like stacking achieving strong results in tabular tasks. In time series forecasting, however, ensemble methods remain underutilized, with simple linear combinations still considered state-of-the-art. In this paper, we systematically explore ensembling strategies for time series forecasting. We evaluate 33 ensemble models -- both existing and novel -- across 50 real-world datasets. Our results show that stacking consistently improves accuracy, though no single stacker performs best across all tasks. To address this, we propose a multi-layer stacking framework for time series forecasting, an approach that combines the strengths of different stacker models. We demonstrate that this method consistently provides superior accuracy across diverse forecasting scenarios. Our findings highlight the potential of stacking-based methods to improve AutoML systems for time series forecasting.

Paper Structure

This paper contains 35 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Architecture and training procedure of a single-layer stacker model.
  • Figure 2: Architecture and training procedure of a multi-layer stacker model.
  • Figure 3: Weights assigned by the L3 ensemble selection algorithm to the L2 models (average over 50 tasks).
  • Figure 4: Influence of the number of validation windows on the model performance.