Hyperparameter Optimization in Machine Learning
Luca Franceschi, Michele Donini, Valerio Perrone, Aaron Klein, Cédric Archambeau, Matthias Seeger, Massimiliano Pontil, Paolo Frasconi
TL;DR
This survey articulates hyperparameter optimization as a structured, repeatable process essential to modern ML performance. It categorizes the main families of HPO approaches—elementary grid/random/ quasi-random methods, model-based Bayesian optimization, multi-fidelity strategies, population-based algorithms, and gradient-based hypergradients—illuminating their trade-offs, parallelizability, and practical considerations. It further surveys extended topics such as multi-objective and constrained HPO, neural architecture search, meta-learning, and transfer across model scales, and reviews HPO systems and benchmarking ecosystems. The work concludes with open questions and directions, emphasizing reproducibility, efficiency, and applicability to large-scale foundation models and unsupervised settings, underscoring the practical impact on automated, scalable ML development.
Abstract
Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determines the effectiveness of systems based on these technologies. Manual hyperparameter search is often time-consuming and becomes infeasible when the number of hyperparameters is large. Automating the search is an important step towards advancing, streamlining, and systematizing machine learning, freeing researchers and practitioners alike from the burden of finding a good set of hyperparameters by trial and error. In this survey, we present a unified treatment of hyperparameter optimization, providing the reader with examples, insights into the state-of-the-art, and numerous links to further reading. We cover the main families of techniques to automate hyperparameter search, often referred to as hyperparameter optimization or tuning, including random and quasi-random search, bandit-, model-, population-, and gradient-based approaches. We further discuss extensions, including online, constrained, and multi-objective formulations, touch upon connections with other fields, such as meta-learning and neural architecture search, and conclude with open questions and future research directions.
