Forecasting with Hyper-Trees
Alexander März, Kashif Rasul
TL;DR
The paper addresses forecasting with time-varying dynamics by learning the parameters of a target time-series model (such as AR($p$) or ETS) as functions of input features using gradient boosted trees. It introduces Hyper-Trees, a framework that converts GBDTs into parameter learners for time-series models, and a hybrid Hyper-TreeNet encoder–decoder to scale parameter estimation via a shallow neural network. The approach achieves competitive results across a diverse set of datasets in local and global forecasting tasks, demonstrates improved extrapolation through parameter-space learning, and provides interpretability via feature-based parameter modulation. This work links classical time-series inductive biases with modern tree-based learning, offering a principled, scalable path to combine cross-series information with time-series structure for practical forecasting applications.
Abstract
We introduce Hyper-Trees as a novel framework for modeling time series data using gradient boosted trees. Unlike conventional tree-based approaches that forecast time series directly, Hyper-Trees learn the parameters of a target time series model, such as ARIMA or Exponential Smoothing, as functions of features. These parameters are then used by the target model to generate the final forecasts. Our framework combines the effectiveness of decision trees on tabular data with classical forecasting models, thereby inducing a time series inductive bias into tree-based models. To resolve the scaling limitations of boosted trees when estimating a high-dimensional set of target model parameters, we combine decision trees and neural networks within a unified framework. In this hybrid approach, the trees generate informative representations from the input features, which a shallow network then uses as input to learn the parameters of a time series model. With our research, we explore the effectiveness of Hyper-Trees across a range of forecasting tasks and extend tree-based modeling beyond its conventional use in time series analysis.
