Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym
Elena Raponi, Nathanael Rakotonirina Carraz, Jérémy Rapin, Carola Doerr, Olivier Teytaud
TL;DR
This work extends a 2013 comparison by evaluating Bayesian optimization and related ML-driven approaches against classical black-box optimization methods on both the BBOB/COCO suite and OpenAI Gym direct policy search under low budgets. It demonstrates that BO-based solvers excel with limited evaluations, though they incur higher computational costs, while traditional methods often outperform them as budgets grow. The study highlights cross-domain strengths, such as some BBO community algorithms performing well on ML tasks, and emphasizes the impact of problem scaling and domain bounds on search behavior. Overall, the results suggest a nuanced view: ML-driven optimization is valuable for quick, small-budget tasks, whereas classical heuristics and domain-specific RL solvers gain traction as budgets increase or problem dimensionality grows, with scaling and initialization playing crucial roles.
Abstract
The growing ubiquity of machine learning (ML) has led it to enter various areas of computer science, including black-box optimization (BBO). Recent research is particularly concerned with Bayesian optimization (BO). BO-based algorithms are popular in the ML community, as they are used for hyperparameter optimization and more generally for algorithm configuration. However, their efficiency decreases as the dimensionality of the problem and the budget of evaluations increase. Meanwhile, derivative-free optimization methods have evolved independently in the optimization community. Therefore, we urge to understand whether cross-fertilization is possible between the two communities, ML and BBO, i.e., whether algorithms that are heavily used in ML also work well in BBO and vice versa. Comparative experiments often involve rather small benchmarks and show visible problems in the experimental setup, such as poor initialization of baselines, overfitting due to problem-specific setting of hyperparameters, and low statistical significance. With this paper, we update and extend a comparative study presented by Hutter et al. in 2013. We compare BBO tools for ML with more classical heuristics, first on the well-known BBOB benchmark suite from the COCO environment and then on Direct Policy Search for OpenAI Gym, a reinforcement learning benchmark. Our results confirm that BO-based optimizers perform well on both benchmarks when budgets are limited, albeit with a higher computational cost, while they are often outperformed by algorithms from other families when the evaluation budget becomes larger. We also show that some algorithms from the BBO community perform surprisingly well on ML tasks.
