Table of Contents
Fetching ...

Default Machine Learning Hyperparameters Do Not Provide Informative Initialization for Bayesian Optimization

Nicolás Villagrán Prieto, Eduardo C. Garrido-Merchán

TL;DR

This work investigates whether default hyperparameters shipped with ML libraries encode informative priors for Bayesian optimization. It formalizes default-centered initialization by drawing initial points from a truncated Gaussian centered at library defaults and compares it to uniform random initialization across three BO back-ends, three model families, and five datasets, evaluating convergence speed and final predictive performance. Across all conditions, the study finds no statistically significant advantage to using defaults ($p$-values in $[0.141,\ 0.908]$) and shows that any early head start from tighter concentration is temporary, with final outcomes matching random initialization. The results argue that defaults should not be relied upon as informative priors for BO and advocate for principled, data-driven search strategies in hyperparameter optimization.

Abstract

Bayesian Optimization (BO) is a standard tool for hyperparameter tuning thanks to its sample efficiency on expensive black-box functions. While most BO pipelines begin with uniform random initialization, default hyperparameter values shipped with popular ML libraries such as scikit-learn encode implicit expert knowledge and could serve as informative starting points that accelerate convergence. This hypothesis, despite its intuitive appeal, has remained largely unexamined. We formalize the idea by initializing BO with points drawn from truncated Gaussian distributions centered at library defaults and compare the resulting trajectories against a uniform-random baseline. We conduct an extensive empirical evaluation spanning three BO back-ends (BoTorch, Optuna, Scikit-Optimize), three model families (Random Forests, Support Vector Machines, Multilayer Perceptrons), and five benchmark datasets covering classification and regression tasks. Performance is assessed through convergence speed and final predictive quality, and statistical significance is determined via one-sided binomial tests. Across all conditions, default-informed initialization yields no statistically significant advantage over purely random sampling, with p-values ranging from 0.141 to 0.908. A sensitivity analysis on the prior variance confirms that, while tighter concentration around the defaults improves early evaluations, this transient benefit vanishes as optimization progresses, leaving final performance unchanged. Our results provide no evidence that default hyperparameters encode useful directional information for optimization. We therefore recommend that practitioners treat hyperparameter tuning as an integral part of model development and favor principled, data-driven search strategies over heuristic reliance on library defaults.

Default Machine Learning Hyperparameters Do Not Provide Informative Initialization for Bayesian Optimization

TL;DR

This work investigates whether default hyperparameters shipped with ML libraries encode informative priors for Bayesian optimization. It formalizes default-centered initialization by drawing initial points from a truncated Gaussian centered at library defaults and compares it to uniform random initialization across three BO back-ends, three model families, and five datasets, evaluating convergence speed and final predictive performance. Across all conditions, the study finds no statistically significant advantage to using defaults (-values in ) and shows that any early head start from tighter concentration is temporary, with final outcomes matching random initialization. The results argue that defaults should not be relied upon as informative priors for BO and advocate for principled, data-driven search strategies in hyperparameter optimization.

Abstract

Bayesian Optimization (BO) is a standard tool for hyperparameter tuning thanks to its sample efficiency on expensive black-box functions. While most BO pipelines begin with uniform random initialization, default hyperparameter values shipped with popular ML libraries such as scikit-learn encode implicit expert knowledge and could serve as informative starting points that accelerate convergence. This hypothesis, despite its intuitive appeal, has remained largely unexamined. We formalize the idea by initializing BO with points drawn from truncated Gaussian distributions centered at library defaults and compare the resulting trajectories against a uniform-random baseline. We conduct an extensive empirical evaluation spanning three BO back-ends (BoTorch, Optuna, Scikit-Optimize), three model families (Random Forests, Support Vector Machines, Multilayer Perceptrons), and five benchmark datasets covering classification and regression tasks. Performance is assessed through convergence speed and final predictive quality, and statistical significance is determined via one-sided binomial tests. Across all conditions, default-informed initialization yields no statistically significant advantage over purely random sampling, with p-values ranging from 0.141 to 0.908. A sensitivity analysis on the prior variance confirms that, while tighter concentration around the defaults improves early evaluations, this transient benefit vanishes as optimization progresses, leaving final performance unchanged. Our results provide no evidence that default hyperparameters encode useful directional information for optimization. We therefore recommend that practitioners treat hyperparameter tuning as an integral part of model development and favor principled, data-driven search strategies over heuristic reliance on library defaults.
Paper Structure (14 sections, 8 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 14 sections, 8 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of one iteration of Bayesian optimization. Top: The GP posterior (blue curve and shaded 95% confidence interval) is fitted to four observations (red dots). The dashed line shows the true (hidden) objective. Bottom: The Expected Improvement acquisition function identifies the next evaluation point $x_{t+1}$ (green marker) by balancing exploration and exploitation.
  • Figure 2: Comparison of initialization strategies for Bayesian optimization. (a) Uniform random initialization draws points (red dots) from a uniform distribution over the entire hyperparameter domain $[a, b]$. (b) Default-centered initialization draws points from a truncated Gaussian centered at the library default value $\mu$, concentrating initial evaluations in the neighborhood of the default configuration.
  • Figure 3: Effect of initialization strategy on the GP posterior. (a) Uniformly spread observations yield a well-calibrated posterior across the entire domain, enabling the surrogate to capture the global structure of the objective (dashed line). The true optimum (green dotted line) falls within a region of low uncertainty. (b) Observations clustered near the library default produce a posterior that is overconfident locally but highly uncertain elsewhere, risking suppressed exploration near the true optimum.
  • Figure 4: Truncated Gaussian distributions for different values of the concentration parameter $\lambda$. For $\lambda = 0.05$, approximately 68% of the probability mass lies within 10% of the parameter range around the default; for $\lambda = 0.30$, the distribution is nearly uniform. The red vertical line marks the library default value.