Table of Contents
Fetching ...

RocketStack: Level-aware Deep Recursive Ensemble Learning Architecture

Çağatay Demirel

TL;DR

RocketStack achieves deep recursive stacking with sublinear computational growth and provides a modular, depth-aware foundation for scalable decision fusion as model pools and feature spaces evolve.

Abstract

Ensemble learning remains a cornerstone of machine learning, with stacking used to integrate predictions from multiple base learners through a meta-model. However, deep stacking remains uncommon due to feature redundancy, complexity, and computational burden. To address these limitations, RocketStack is introduced as a level-aware recursive stacking architecture explored up to ten stacking levels, extending beyond prior architectures. At level 1, base-learner predictions are fused with original features; at later levels, weaker learners are incrementally pruned using out-of-fold (OOF) scores. To curb early saturation, pruning is regularized by applying Gaussian perturbations at two noise scales to OOF scores prior to model selection for next-level stacking, alongside deterministic pruning. To control feature growth, periodic compression is applied at levels 3, 6, and 9 using Simple, Fast, Efficient (SFE) filtering, attention-based selection, and autoencoders. Across 33 datasets (23 binary, 10 multi-class), increasing accuracy with depth is confirmed by linear mixed-effects trend tests, and the best meta-model per level increasingly outperforms the best standalone ensemble. OOF-perturbed pruning is found to improve stability and late-level gains, while periodic compression is found to yield substantial runtime and dimensionality reductions with minimal accuracy drop. At the deepest level, accuracy slightly surpasses established deep tabular baselines. When hyperparameter optimization is performed on baseline models, early performance is boosted; however, untuned RocketStack closes the gap with depth and remains competitive at later levels. It achieves deep recursive stacking with sublinear computational growth and provides a modular, depth-aware foundation for scalable decision fusion as model pools and feature spaces evolve.

RocketStack: Level-aware Deep Recursive Ensemble Learning Architecture

TL;DR

RocketStack achieves deep recursive stacking with sublinear computational growth and provides a modular, depth-aware foundation for scalable decision fusion as model pools and feature spaces evolve.

Abstract

Ensemble learning remains a cornerstone of machine learning, with stacking used to integrate predictions from multiple base learners through a meta-model. However, deep stacking remains uncommon due to feature redundancy, complexity, and computational burden. To address these limitations, RocketStack is introduced as a level-aware recursive stacking architecture explored up to ten stacking levels, extending beyond prior architectures. At level 1, base-learner predictions are fused with original features; at later levels, weaker learners are incrementally pruned using out-of-fold (OOF) scores. To curb early saturation, pruning is regularized by applying Gaussian perturbations at two noise scales to OOF scores prior to model selection for next-level stacking, alongside deterministic pruning. To control feature growth, periodic compression is applied at levels 3, 6, and 9 using Simple, Fast, Efficient (SFE) filtering, attention-based selection, and autoencoders. Across 33 datasets (23 binary, 10 multi-class), increasing accuracy with depth is confirmed by linear mixed-effects trend tests, and the best meta-model per level increasingly outperforms the best standalone ensemble. OOF-perturbed pruning is found to improve stability and late-level gains, while periodic compression is found to yield substantial runtime and dimensionality reductions with minimal accuracy drop. At the deepest level, accuracy slightly surpasses established deep tabular baselines. When hyperparameter optimization is performed on baseline models, early performance is boosted; however, untuned RocketStack closes the gap with depth and remains competitive at later levels. It achieves deep recursive stacking with sublinear computational growth and provides a modular, depth-aware foundation for scalable decision fusion as model pools and feature spaces evolve.

Paper Structure

This paper contains 41 sections, 7 equations, 11 figures, 10 tables, 1 algorithm.

Figures (11)

  • Figure 1: Schematic overview of the study
  • Figure 2: Multi-level recursive ensemble learning (RocketStack) pipeline
  • Figure 3: Estimation performance across ensembling depths for different feature selection strategies in recursive blend ensembling. Each violin plot shows accuracy distributions across datasets for a given strategy, with the thick black dashed line marking the mean and thin gray lines tracking individual trends. A: Compares overall accuracy (%) across strategies, including a no-feature-selection baseline, for binary (A1) and multi-class (A2) tasks. Pruning variants—strict OOF-based pruning and randomized OOF score perturbation ($\lambda = 0$: none, $0.05$: light, $0.1$: moderate)—are applied to the SFE strategy (binary) and attention-based strategy (multi-class). B: Highlights baseline vs. best-performing periodic variants—SFE for binary (B1), attention for multi-class (B2)—under strict pruning. C: Assesses selection frequency patterns, contrasting per-level vs. periodic (levels 3, 6, 9) for binary (C1) and multi-class (C2); includes SFE, autoencoder, and attention-based strategies. D: Evaluates RocketStack under OOF-score randomizations prior to pruning: in binary (D1), light randomization outperforms strict pruning; moderate offers no further gain. In multi-class (D2), both light and moderate outperform strict pruning, with light performing best. Asterisks (*) indicate statistically significant differences (FDR-corrected, $p < .05$); 2L: 2-layer autoencoder; 3L: 3-layer autoencoder; periodic refers to level 3, 6 and 9.
  • Figure 4: Trend analysis results across ensembling depth. A: Shows accuracy (%) trends from baseline (individual) model performance to recursive stacking levels (L1–L10), including stack-of-stacking as the final level. Averages are computed across all surviving models, datasets, and 5-fold splits for both binary (A1) and multi-class (A2) settings. Broken axis is used in some multi-class trend visuals (A2) to compress the large level-0 to level-1 gain and better emphasize gradual improvements from level-1 to level-10. Statistical significance of trends was assessed using LMM, comparing performance from the baseline through L10 (excluding stack-of-stacking); asterisks (*) indicate significant trends ($p < .001$), while “N.S.” denotes non-significant results. B: Reports direct accuracy differences between baseline and level-10 performance across all RocketStack variants for binary and multi-class tasks, highlighting the superiority of periodic attention-based feature compression with light OOF-score randomization for pruning. C: Visualizes model-wise performance trajectories from individual models through stack-of-stacking without averaging, using heatmaps to depict progression across ensembling levels. periodic refers to level 3, 6 and 9.
  • Figure 5: Performance comparison of RocketStack across ensembling levels against top-performing baseline ensemble models in binary (A) and multi-class (B) classification tasks. Each violin plot represents the distribution of accuracy scores across different ensembling levels, with individual baseline ensemble models shown in gray. The dashed black line connects the highest-performing model at each ensembling level, highlighting performance trends as depth increases. Different feature selection strategies (SFE, autoencoder-based, attention-based) are evaluated to assess their impact on ensembling effectiveness. periodic refers to feature selection applied specifically at levels 3, 6, and 9. Overall, deeper RocketStack levels consistently yield superior or comparable performance to baseline ensemble models across various scenarios.
  • ...and 6 more figures