Freeze-Thaw Bayesian Optimization
Kevin Swersky, Jasper Snoek, Ryan Prescott Adams
TL;DR
The paper tackles efficient hyperparameter search by leveraging partial progress during iterative training to pause, resume, or spawn new trials. It introduces a nonparametric kernel for training curves based on an infinite mixture of exponentially decaying bases, enabling accurate forecasting of final performance from early observations. It also proposes a scalable spatiotemporal Gaussian process prior that models a global mean across hyperparameters and independent per-curve GPs, enabling efficient inference via Woodbury identities. An information-theoretic framework, specifically entropy search, guides when to freeze, thaw, or initialize models by maximizing expected information about the asymptotic minimum. Empirical results on logistic regression, online LDA, and PMF show substantial speedups over previous BO methods, confirming the practicality of dynamic training management.
Abstract
In this paper we develop a dynamic form of Bayesian optimization for machine learning models with the goal of rapidly finding good hyperparameter settings. Our method uses the partial information gained during the training of a machine learning model in order to decide whether to pause training and start a new model, or resume the training of a previously-considered model. We specifically tailor our method to machine learning problems by developing a novel positive-definite covariance kernel to capture a variety of training curves. Furthermore, we develop a Gaussian process prior that scales gracefully with additional temporal observations. Finally, we provide an information-theoretic framework to automate the decision process. Experiments on several common machine learning models show that our approach is extremely effective in practice.
