PSEO: Optimizing Post-hoc Stacking Ensemble Through Hyperparameter Tuning
Beicheng Xu, Wei Liu, Keyao Ding, Yupeng Lu, Bin Cui
TL;DR
PSEO tackles the CASH-driven AutoML bottleneck by optimizing post-hoc stacking ensembles rather than relying on fixed ensemble designs. It introduces a tractable base-model subset selection via a BQP formulation with an SDP relaxation to balance individual model quality and inter-model diversity, and then builds a deep stacking ensemble with Dropout and Retain to exploit multi-layer architectures while mitigating overfitting and feature degradation. The final ensemble hyperparameters are explored with Bayesian optimization over a space that includes ensemble size, diversity weight, layer count, blender model, dropout rate, and retain flag, enabling task-specific configurations. Empirical results on 80 public datasets show that PSEO achieves the best average test rank (2.96) among 16 methods, with statistically significant improvements over AutoGluon baselines, demonstrating robust gains from task-adaptive base-model selection, deep stacking, and principled ensemble optimization.
Abstract
The Combined Algorithm Selection and Hyperparameter Optimization (CASH) problem is fundamental in Automated Machine Learning (AutoML). Inspired by the success of ensemble learning, recent AutoML systems construct post-hoc ensembles for final predictions rather than relying on the best single model. However, while most CASH methods conduct extensive searches for the optimal single model, they typically employ fixed strategies during the ensemble phase that fail to adapt to specific task characteristics. To tackle this issue, we propose PSEO, a framework for post-hoc stacking ensemble optimization. First, we conduct base model selection through binary quadratic programming, with a trade-off between diversity and performance. Furthermore, we introduce two mechanisms to fully realize the potential of multi-layer stacking. Finally, PSEO builds a hyperparameter space and searches for the optimal post-hoc ensemble strategy within it. Empirical results on 80 public datasets show that \sys achieves the best average test rank (2.96) among 16 methods, including post-hoc designs in recent AutoML systems and state-of-the-art ensemble learning methods.
