Default Machine Learning Hyperparameters Do Not Provide Informative Initialization for Bayesian Optimization
Nicolás Villagrán Prieto, Eduardo C. Garrido-Merchán
TL;DR
This work investigates whether default hyperparameters shipped with ML libraries encode informative priors for Bayesian optimization. It formalizes default-centered initialization by drawing initial points from a truncated Gaussian centered at library defaults and compares it to uniform random initialization across three BO back-ends, three model families, and five datasets, evaluating convergence speed and final predictive performance. Across all conditions, the study finds no statistically significant advantage to using defaults ($p$-values in $[0.141,\ 0.908]$) and shows that any early head start from tighter concentration is temporary, with final outcomes matching random initialization. The results argue that defaults should not be relied upon as informative priors for BO and advocate for principled, data-driven search strategies in hyperparameter optimization.
Abstract
Bayesian Optimization (BO) is a standard tool for hyperparameter tuning thanks to its sample efficiency on expensive black-box functions. While most BO pipelines begin with uniform random initialization, default hyperparameter values shipped with popular ML libraries such as scikit-learn encode implicit expert knowledge and could serve as informative starting points that accelerate convergence. This hypothesis, despite its intuitive appeal, has remained largely unexamined. We formalize the idea by initializing BO with points drawn from truncated Gaussian distributions centered at library defaults and compare the resulting trajectories against a uniform-random baseline. We conduct an extensive empirical evaluation spanning three BO back-ends (BoTorch, Optuna, Scikit-Optimize), three model families (Random Forests, Support Vector Machines, Multilayer Perceptrons), and five benchmark datasets covering classification and regression tasks. Performance is assessed through convergence speed and final predictive quality, and statistical significance is determined via one-sided binomial tests. Across all conditions, default-informed initialization yields no statistically significant advantage over purely random sampling, with p-values ranging from 0.141 to 0.908. A sensitivity analysis on the prior variance confirms that, while tighter concentration around the defaults improves early evaluations, this transient benefit vanishes as optimization progresses, leaving final performance unchanged. Our results provide no evidence that default hyperparameters encode useful directional information for optimization. We therefore recommend that practitioners treat hyperparameter tuning as an integral part of model development and favor principled, data-driven search strategies over heuristic reliance on library defaults.
