Table of Contents
Fetching ...

Hierarchical Ensemble-Based Feature Selection for Time Series Forecasting

Aysin Tumay, Mustafa E. Aydin, Ali T. Koc, Suleyman S. Kozat

TL;DR

This paper tackles the challenge of feature selection for time series with many features and limited samples in nonstationary settings. It proposes a hierarchical ensemble-based feature selection framework that layers multiple models on distinct feature subsets, using cost-optimized weights to refine predictions and exploit feature co-dependency. The method supports arbitrary loss functions, extends GBM-related optimization to include external blocks, and emphasizes domain-knowledge feature groups. Empirical results on synthetic data and the M4 hourly dataset show that the Hierarchical Ensemble achieves robust, scalable improvements over wrappers, filters, embedded, and baseline GBM approaches, with statistical significance reinforced by paired t-tests. The work provides open-source code and outlines extensions to deeper hierarchies and alternative base models to broaden applicability.

Abstract

We introduce a novel ensemble approach for feature selection based on hierarchical stacking for non-stationarity and/or a limited number of samples with a large number of features. Our approach exploits the co-dependency between features using a hierarchical structure. Initially, a machine learning model is trained using a subset of features, and then the output of the model is updated using other algorithms in a hierarchical manner with the remaining features to minimize the target loss. This hierarchical structure allows for flexible depth and feature selection. By exploiting feature co-dependency hierarchically, our proposed approach overcomes the limitations of traditional feature selection methods and feature importance scores. The effectiveness of the approach is demonstrated on synthetic and well-known real-life datasets, providing significant scalable and stable performance improvements compared to the traditional methods and the state-of-the-art approaches. We also provide the source code of our approach to facilitate further research and replicability of our results.

Hierarchical Ensemble-Based Feature Selection for Time Series Forecasting

TL;DR

This paper tackles the challenge of feature selection for time series with many features and limited samples in nonstationary settings. It proposes a hierarchical ensemble-based feature selection framework that layers multiple models on distinct feature subsets, using cost-optimized weights to refine predictions and exploit feature co-dependency. The method supports arbitrary loss functions, extends GBM-related optimization to include external blocks, and emphasizes domain-knowledge feature groups. Empirical results on synthetic data and the M4 hourly dataset show that the Hierarchical Ensemble achieves robust, scalable improvements over wrappers, filters, embedded, and baseline GBM approaches, with statistical significance reinforced by paired t-tests. The work provides open-source code and outlines extensions to deeper hierarchies and alternative base models to broaden applicability.

Abstract

We introduce a novel ensemble approach for feature selection based on hierarchical stacking for non-stationarity and/or a limited number of samples with a large number of features. Our approach exploits the co-dependency between features using a hierarchical structure. Initially, a machine learning model is trained using a subset of features, and then the output of the model is updated using other algorithms in a hierarchical manner with the remaining features to minimize the target loss. This hierarchical structure allows for flexible depth and feature selection. By exploiting feature co-dependency hierarchically, our proposed approach overcomes the limitations of traditional feature selection methods and feature importance scores. The effectiveness of the approach is demonstrated on synthetic and well-known real-life datasets, providing significant scalable and stable performance improvements compared to the traditional methods and the state-of-the-art approaches. We also provide the source code of our approach to facilitate further research and replicability of our results.
Paper Structure (20 sections, 17 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 17 equations, 4 figures, 5 tables, 2 algorithms.

Figures (4)

  • Figure 1: We have $K$ number of feature subsets used as inputs to $K$ base models (blue). Then, we combine base learners with $\boldsymbol{\alpha}_t^{(i)}$ for final prediction (pink).
  • Figure 2: We have 2 feature subsets that are inputted to 2 different models in a hierarchical order. The first layer (orange) takes the $y_t$-related features as input. In the next step, ${\alpha}_t^{(i)}$ is generated with cost optimization. Then, the second layer (pink) predicts ${\alpha}_t^{(i)}$. Finally, the second layer predictions (green) are generated by combining ${\Tilde{\alpha}}_t^{(i)}$ and ${\Tilde{y}}^{(i)}_{t}$.
  • Figure 3: Comparison of the mean square error performances of Hierarchical Ensemble (black), Ensemble (blue), Base LightGBM (green), Embedded (red), Wrapper (purple) for the synthetic dataset.
  • Figure 4: Comparison of the mean square error performances of Hierarchical Ensemble (black), Ensemble (blue), Base LightGBM (green), Embedded (red), Wrapper (purple) for the M4 hourly competition dataset.