Forecasting with Hyper-Trees

Alexander März; Kashif Rasul

Forecasting with Hyper-Trees

Alexander März, Kashif Rasul

TL;DR

The paper addresses forecasting with time-varying dynamics by learning the parameters of a target time-series model (such as AR($p$) or ETS) as functions of input features using gradient boosted trees. It introduces Hyper-Trees, a framework that converts GBDTs into parameter learners for time-series models, and a hybrid Hyper-TreeNet encoder–decoder to scale parameter estimation via a shallow neural network. The approach achieves competitive results across a diverse set of datasets in local and global forecasting tasks, demonstrates improved extrapolation through parameter-space learning, and provides interpretability via feature-based parameter modulation. This work links classical time-series inductive biases with modern tree-based learning, offering a principled, scalable path to combine cross-series information with time-series structure for practical forecasting applications.

Abstract

We introduce Hyper-Trees as a novel framework for modeling time series data using gradient boosted trees. Unlike conventional tree-based approaches that forecast time series directly, Hyper-Trees learn the parameters of a target time series model, such as ARIMA or Exponential Smoothing, as functions of features. These parameters are then used by the target model to generate the final forecasts. Our framework combines the effectiveness of decision trees on tabular data with classical forecasting models, thereby inducing a time series inductive bias into tree-based models. To resolve the scaling limitations of boosted trees when estimating a high-dimensional set of target model parameters, we combine decision trees and neural networks within a unified framework. In this hybrid approach, the trees generate informative representations from the input features, which a shallow network then uses as input to learn the parameters of a time series model. With our research, we explore the effectiveness of Hyper-Trees across a range of forecasting tasks and extend tree-based modeling beyond its conventional use in time series analysis.

Forecasting with Hyper-Trees

TL;DR

The paper addresses forecasting with time-varying dynamics by learning the parameters of a target time-series model (such as AR(

) or ETS) as functions of input features using gradient boosted trees. It introduces Hyper-Trees, a framework that converts GBDTs into parameter learners for time-series models, and a hybrid Hyper-TreeNet encoder–decoder to scale parameter estimation via a shallow neural network. The approach achieves competitive results across a diverse set of datasets in local and global forecasting tasks, demonstrates improved extrapolation through parameter-space learning, and provides interpretability via feature-based parameter modulation. This work links classical time-series inductive biases with modern tree-based learning, offering a principled, scalable path to combine cross-series information with time-series structure for practical forecasting applications.

Abstract

Paper Structure (28 sections, 11 equations, 15 figures, 7 tables, 2 algorithms)

This paper contains 28 sections, 11 equations, 15 figures, 7 tables, 2 algorithms.

Introduction
Hyper-Trees
Gradient Boosted Decision Trees
Hyper-Tree Architecture
Related Work
Experiments
Local Time Series Model: Introductory Example
Local Time Series Models: Extended Evaluation
Global Time Series Models
Ablation Studies
Framework Analysis and Considerations
Parameter vs. Function Space
Feature Dependency
Target Model Dependency
Extrapolation Properties
...and 13 more sections

Figures (15)

Figure 1: Conventional GBDT architecture showing feature input, decision tree processing, output generation, and loss calculation, with backward pass for gradient-based optimization.
Figure 2: Hyper-Tree architecture illustrating a unified framework where a Hyper-Tree generates parameters for a target model. The output of the target model is passed to a loss function, with gradients and Hessians flowing back, enabling learning of temporal dependencies and integration of diverse feature types.
Figure 3: Distributional Hyper-Tree Architecture for a probabilistic framework, where the Hyper-Tree generates parameters for both a target model and an output distribution. The target model outputs the mean $(\mu_{t})$ while the Hyper-Tree directly estimates the standard deviation $(\sigma_{t})$, enabling probabilistic forecasting.
Figure 4: Scaling performance comparison between Hyper-Tree-AR($p$) and Hyper-TreeNet-AR($p$) models. The figure shows runtimes as the number of AR-parameters increases. All runtimes are normalized with respect to the runtime of estimating one target model parameter.
Figure 5: Hyper-TreeNet architecture illustrating a hybrid approach combining GBDTs and neural networks. A Hyper-Tree generates low-dimensional parameter-space embeddings, which are transformed by a Multi-Layer Perceptron (MLP) to generate parameters for a target model. The architecture allows for joint optimization, enabling integrated tree-based feature learning and network-based parameter mapping.
...and 10 more figures

Forecasting with Hyper-Trees

TL;DR

Abstract

Forecasting with Hyper-Trees

Authors

TL;DR

Abstract

Table of Contents

Figures (15)