HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO
Katharina Eggensperger, Philipp Müller, Neeratyoy Mallik, Matthias Feurer, René Sass, Aaron Klein, Noor Awad, Marius Lindauer, Frank Hutter
TL;DR
<3-5 sentence high-level summary> HPOBench addresses the need for realistic, diverse, and reproducible benchmarks for hyperparameter optimization, with a particular emphasis on multi-fidelity problems. It delivers a containerized library of 12 benchmark families (7 existing, 5 new) totaling over 100 multi-fidelity problems, plus surrogate and tabular variants to enable scalable evaluations. The paper demonstrates broad compatibility by evaluating 13 optimizers from 6 optimization tools and shows that advanced multi-fidelity methods offer substantial benefits at small budgets while remaining competitive at larger budgets. This benchmark suite aims to standardize and accelerate progress in HPO, NAS, and transfer-MLO across diverse datasets and fidelities, enabling fair comparisons and long-term maintainability.
Abstract
To achieve peak predictive performance, hyperparameter optimization (HPO) is a crucial component of machine learning and its applications. Over the last years, the number of efficient algorithms and tools for HPO grew substantially. At the same time, the community is still lacking realistic, diverse, computationally cheap, and standardized benchmarks. This is especially the case for multi-fidelity HPO methods. To close this gap, we propose HPOBench, which includes 7 existing and 5 new benchmark families, with a total of more than 100 multi-fidelity benchmark problems. HPOBench allows to run this extendable set of multi-fidelity HPO benchmarks in a reproducible way by isolating and packaging the individual benchmarks in containers. It also provides surrogate and tabular benchmarks for computationally affordable yet statistically sound evaluations. To demonstrate HPOBench's broad compatibility with various optimization tools, as well as its usefulness, we conduct an exemplary large-scale study evaluating 13 optimizers from 6 optimization tools. We provide HPOBench here: https://github.com/automl/HPOBench.
