Table of Contents
Fetching ...

BOHB: Robust and Efficient Hyperparameter Optimization at Scale

Stefan Falkner, Aaron Klein, Frank Hutter

TL;DR

BOHB addresses the scalability challenge of hyperparameter optimization for large-scale ML by fusing Hyperband's resource-aware scheduling with a KDE-based Bayesian optimization component. It operates over budgets $b$ in a range $[b_{min}, b_{max}]$ and uses a single multidimensional KDE to guide configuration search, complemented by an exploration mechanism and random exploration to preserve diversity. Empirically, BOHB delivers strong anytime performance and rapid convergence to near-optimal configurations across diverse domains (SVMs, FFNNs, Bayesian NNs, RL, and CIFAR-10 CNNs), outperforming standalone Bayesian optimization and Hyperband in many settings and scaling well with parallel resources. The method is simple to implement and practical, with open-source code available for broader deployment and future budget-adaptation enhancements.

Abstract

Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically computationally infeasible. On the other hand, bandit-based configuration evaluation approaches based on random search lack guidance and do not converge to the best configurations as quickly. Here, we propose to combine the benefits of both Bayesian optimization and bandit-based methods, in order to achieve the best of both worlds: strong anytime performance and fast convergence to optimal configurations. We propose a new practical state-of-the-art hyperparameter optimization method, which consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types, including high-dimensional toy functions, support vector machines, feed-forward neural networks, Bayesian neural networks, deep reinforcement learning, and convolutional neural networks. Our method is robust and versatile, while at the same time being conceptually simple and easy to implement.

BOHB: Robust and Efficient Hyperparameter Optimization at Scale

TL;DR

BOHB addresses the scalability challenge of hyperparameter optimization for large-scale ML by fusing Hyperband's resource-aware scheduling with a KDE-based Bayesian optimization component. It operates over budgets in a range and uses a single multidimensional KDE to guide configuration search, complemented by an exploration mechanism and random exploration to preserve diversity. Empirically, BOHB delivers strong anytime performance and rapid convergence to near-optimal configurations across diverse domains (SVMs, FFNNs, Bayesian NNs, RL, and CIFAR-10 CNNs), outperforming standalone Bayesian optimization and Hyperband in many settings and scaling well with parallel resources. The method is simple to implement and practical, with open-source code available for broader deployment and future budget-adaptation enhancements.

Abstract

Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically computationally infeasible. On the other hand, bandit-based configuration evaluation approaches based on random search lack guidance and do not converge to the best configurations as quickly. Here, we propose to combine the benefits of both Bayesian optimization and bandit-based methods, in order to achieve the best of both worlds: strong anytime performance and fast convergence to optimal configurations. We propose a new practical state-of-the-art hyperparameter optimization method, which consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types, including high-dimensional toy functions, support vector machines, feed-forward neural networks, Bayesian neural networks, deep reinforcement learning, and convolutional neural networks. Our method is robust and versatile, while at the same time being conceptually simple and easy to implement.

Paper Structure

This paper contains 18 sections, 4 equations, 7 figures.

Figures (7)

  • Figure 1: Illustration of typical results obtained, here for optimizing six hyperparameters of a neural network. We show the immediate regret of the best configuration found by 4 methods as a function of time. Hyperband has strong anytime performance, but for larger budgets does not perform much better than random search. In contrast, Bayesian optimization starts slowly (like random search), but given enough time outperforms Hyperband. Our new method BOHB achieves the best of both worlds, starting fast and also converging to the global optimum quickly.
  • Figure 2: Performance of our method with different number of parallel workers on the letter surrogate benchmark (see Sec. \ref{['sec:experiments']}) for 128 iterations. The speedup for two and four workers is close to linear, for more workers it becomes sublinear. For example, the speedup to achieve a regret of $10^{-2}$ for one vs. 32 workers is ca. $2000s / 130s \approx 15$. We plot the mean and twice the standard error of the mean over 128 runs.
  • Figure 3: Results for the counting ones problem in 16 dimensional space with 8 categorical and 8 continuous hyperparameters. In higher dimensional spaces RS-based methods need exponentially more samples to find good solutions.
  • Figure 4: Comparison on the SVM on MNIST surrogates as described in klein-ejs17. BOHB works similarly to Fabolas on this two dimensional benchmark and outperforms MTBO and HB.
  • Figure 5: Optimizing six hyperparameter of a feed-forward neural network on featurized datasets; results are based on surrogate benchmarks. Results for the other 5 datasets are qualitatively similar and are shown in Figure 1 in the supplementary material.
  • ...and 2 more figures