Selecting Hyperparameters for Tree-Boosting

Floris Jan Koster; Fabio Sigrist

Selecting Hyperparameters for Tree-Boosting

Floris Jan Koster, Fabio Sigrist

TL;DR

This study tackles the difficulty of tuning hyperparameters for tree-boosting on tabular data by benchmarking multiple hyperparameter optimization methods across 59 OpenML datasets. It systematically compares deterministic full grid, random grid, GP-BO, TPE, Hyperband, and SMAC, using a LightGBM-based boosting setup with consistent evaluation through nested cross-validation and a fixed trial budget. The key finding is that SMAC consistently delivers the best predictive performance, with TPE as a common runner-up, while default hyperparameters and some search strategies lag behind; notably, effective tuning typically requires more than 100 trials, and all hyperparameters materially affect accuracy. The paper also shows that for regression tasks, early stopping for the number of boosting iterations often outperforms including iterations in the search space, and a meta-analysis reveals that no single hyperparameter dominates in importance. Practically, practitioners should favor SMAC with a sufficiently large budget, avoid defaults, and use early stopping for regression, while recognizing that no single method is universally superior across all datasets. Limitations include focusing on a single boosting implementation and not exhaustively balancing cost–benefit trade-offs or exploring cross-task transfer or multi-fidelity strategies in depth.

Abstract

Tree-boosting is a widely used machine learning technique for tabular data. However, its out-of-sample accuracy is critically dependent on multiple hyperparameters. In this article, we empirically compare several popular methods for hyperparameter optimization for tree-boosting including random grid search, the tree-structured Parzen estimator (TPE), Gaussian-process-based Bayesian optimization (GP-BO), Hyperband, the sequential model-based algorithm configuration (SMAC) method, and deterministic full grid search using $59$ regression and classification data sets. We find that the SMAC method clearly outperforms all the other considered methods. We further observe that (i) a relatively large number of trials larger than $100$ is required for accurate tuning, (ii) using default values for hyperparameters yields very inaccurate models, (iii) all considered hyperparameters can have a material effect on the accuracy of tree-boosting, i.e., there is no small set of hyperparameters that is more important than others, and (iv) choosing the number of boosting iterations using early stopping yields more accurate results compared to including it in the search space for regression tasks.

Selecting Hyperparameters for Tree-Boosting

TL;DR

Abstract

regression and classification data sets. We find that the SMAC method clearly outperforms all the other considered methods. We further observe that (i) a relatively large number of trials larger than

is required for accurate tuning, (ii) using default values for hyperparameters yields very inaccurate models, (iii) all considered hyperparameters can have a material effect on the accuracy of tree-boosting, i.e., there is no small set of hyperparameters that is more important than others, and (iv) choosing the number of boosting iterations using early stopping yields more accurate results compared to including it in the search space for regression tasks.

Paper Structure (12 sections, 8 equations, 11 figures, 5 tables)

This paper contains 12 sections, 8 equations, 11 figures, 5 tables.

Introduction
Related literature
Experimental settings
Hyperparameter selection methods and software used
Data sets
Train-test splits
Hyperparameters and search spaces
Evaluation scores and aggregation across data sets
Results
The importance of individual hyperparameters
Conclusion
Additional results

Figures (11)

Figure 1: Normalized scores as a function of number of trials. The confidence intervals represent the uncertainty across data sets.
Figure 2: SHAP summary plot for regression (top) and classification (bottom) tasks.
Figure 3: Relative differences to the best score as a function of the number of trials.
Figure 4: Normalized scores as a function of the number of trials. The confidence intervals represent uncertainty due to the randomness in the hyperparameter selection methods.
Figure 5: $R^2$ as a function of the number of trials per data set. The confidence intervals represent uncertainty due to the randomness in the hyperparameter selection methods.
...and 6 more figures

Selecting Hyperparameters for Tree-Boosting

TL;DR

Abstract

Selecting Hyperparameters for Tree-Boosting

Authors

TL;DR

Abstract

Table of Contents

Figures (11)