Selecting Hyperparameters for Tree-Boosting
Floris Jan Koster, Fabio Sigrist
TL;DR
This study tackles the difficulty of tuning hyperparameters for tree-boosting on tabular data by benchmarking multiple hyperparameter optimization methods across 59 OpenML datasets. It systematically compares deterministic full grid, random grid, GP-BO, TPE, Hyperband, and SMAC, using a LightGBM-based boosting setup with consistent evaluation through nested cross-validation and a fixed trial budget. The key finding is that SMAC consistently delivers the best predictive performance, with TPE as a common runner-up, while default hyperparameters and some search strategies lag behind; notably, effective tuning typically requires more than 100 trials, and all hyperparameters materially affect accuracy. The paper also shows that for regression tasks, early stopping for the number of boosting iterations often outperforms including iterations in the search space, and a meta-analysis reveals that no single hyperparameter dominates in importance. Practically, practitioners should favor SMAC with a sufficiently large budget, avoid defaults, and use early stopping for regression, while recognizing that no single method is universally superior across all datasets. Limitations include focusing on a single boosting implementation and not exhaustively balancing cost–benefit trade-offs or exploring cross-task transfer or multi-fidelity strategies in depth.
Abstract
Tree-boosting is a widely used machine learning technique for tabular data. However, its out-of-sample accuracy is critically dependent on multiple hyperparameters. In this article, we empirically compare several popular methods for hyperparameter optimization for tree-boosting including random grid search, the tree-structured Parzen estimator (TPE), Gaussian-process-based Bayesian optimization (GP-BO), Hyperband, the sequential model-based algorithm configuration (SMAC) method, and deterministic full grid search using $59$ regression and classification data sets. We find that the SMAC method clearly outperforms all the other considered methods. We further observe that (i) a relatively large number of trials larger than $100$ is required for accurate tuning, (ii) using default values for hyperparameters yields very inaccurate models, (iii) all considered hyperparameters can have a material effect on the accuracy of tree-boosting, i.e., there is no small set of hyperparameters that is more important than others, and (iv) choosing the number of boosting iterations using early stopping yields more accurate results compared to including it in the search space for regression tasks.
