Table of Contents
Fetching ...

PSEO: Optimizing Post-hoc Stacking Ensemble Through Hyperparameter Tuning

Beicheng Xu, Wei Liu, Keyao Ding, Yupeng Lu, Bin Cui

TL;DR

PSEO tackles the CASH-driven AutoML bottleneck by optimizing post-hoc stacking ensembles rather than relying on fixed ensemble designs. It introduces a tractable base-model subset selection via a BQP formulation with an SDP relaxation to balance individual model quality and inter-model diversity, and then builds a deep stacking ensemble with Dropout and Retain to exploit multi-layer architectures while mitigating overfitting and feature degradation. The final ensemble hyperparameters are explored with Bayesian optimization over a space that includes ensemble size, diversity weight, layer count, blender model, dropout rate, and retain flag, enabling task-specific configurations. Empirical results on 80 public datasets show that PSEO achieves the best average test rank (2.96) among 16 methods, with statistically significant improvements over AutoGluon baselines, demonstrating robust gains from task-adaptive base-model selection, deep stacking, and principled ensemble optimization.

Abstract

The Combined Algorithm Selection and Hyperparameter Optimization (CASH) problem is fundamental in Automated Machine Learning (AutoML). Inspired by the success of ensemble learning, recent AutoML systems construct post-hoc ensembles for final predictions rather than relying on the best single model. However, while most CASH methods conduct extensive searches for the optimal single model, they typically employ fixed strategies during the ensemble phase that fail to adapt to specific task characteristics. To tackle this issue, we propose PSEO, a framework for post-hoc stacking ensemble optimization. First, we conduct base model selection through binary quadratic programming, with a trade-off between diversity and performance. Furthermore, we introduce two mechanisms to fully realize the potential of multi-layer stacking. Finally, PSEO builds a hyperparameter space and searches for the optimal post-hoc ensemble strategy within it. Empirical results on 80 public datasets show that \sys achieves the best average test rank (2.96) among 16 methods, including post-hoc designs in recent AutoML systems and state-of-the-art ensemble learning methods.

PSEO: Optimizing Post-hoc Stacking Ensemble Through Hyperparameter Tuning

TL;DR

PSEO tackles the CASH-driven AutoML bottleneck by optimizing post-hoc stacking ensembles rather than relying on fixed ensemble designs. It introduces a tractable base-model subset selection via a BQP formulation with an SDP relaxation to balance individual model quality and inter-model diversity, and then builds a deep stacking ensemble with Dropout and Retain to exploit multi-layer architectures while mitigating overfitting and feature degradation. The final ensemble hyperparameters are explored with Bayesian optimization over a space that includes ensemble size, diversity weight, layer count, blender model, dropout rate, and retain flag, enabling task-specific configurations. Empirical results on 80 public datasets show that PSEO achieves the best average test rank (2.96) among 16 methods, with statistically significant improvements over AutoGluon baselines, demonstrating robust gains from task-adaptive base-model selection, deep stacking, and principled ensemble optimization.

Abstract

The Combined Algorithm Selection and Hyperparameter Optimization (CASH) problem is fundamental in Automated Machine Learning (AutoML). Inspired by the success of ensemble learning, recent AutoML systems construct post-hoc ensembles for final predictions rather than relying on the best single model. However, while most CASH methods conduct extensive searches for the optimal single model, they typically employ fixed strategies during the ensemble phase that fail to adapt to specific task characteristics. To tackle this issue, we propose PSEO, a framework for post-hoc stacking ensemble optimization. First, we conduct base model selection through binary quadratic programming, with a trade-off between diversity and performance. Furthermore, we introduce two mechanisms to fully realize the potential of multi-layer stacking. Finally, PSEO builds a hyperparameter space and searches for the optimal post-hoc ensemble strategy within it. Empirical results on 80 public datasets show that \sys achieves the best average test rank (2.96) among 16 methods, including post-hoc designs in recent AutoML systems and state-of-the-art ensemble learning methods.

Paper Structure

This paper contains 38 sections, 3 theorems, 24 equations, 11 figures, 15 tables, 1 algorithm.

Key Result

Theorem 1

Consider a set of uncorrelated base models predictions $\mathcal{Z} = \{z_1, z_2, ..., z_{n'} \}$ in a weighted ensemble $z_{\text{ens}} = \sum_i \beta_iz_i$. The $i$-th prediction is dropout with rate $d_i$($d_i = \rho_i\gamma_0$ and $\rho_1 = 1 \geq \rho_i \geq 0, i\in\{1,2,\ldots,n'\}$). If we pe

Figures (11)

  • Figure 1: Average test rank of staking across each hyperparameter's values with other parameters fixed.
  • Figure 2: Overview of PSEO.
  • Figure 3: Average test ranks of stacking with different base model subset selection methods across 80 datasets.
  • Figure 4: Effectiveness of Dropout and Retain mechanisms.
  • Figure 5: Normalised Improvement Boxplots: Higher is better. Each dot represents a dataset. The number in square brackets counts the outliers that are not shown in the plot.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Lemma 1
  • proof
  • Theorem 2: Expectation of Average Estimator with Multiple Sampling
  • proof
  • proof