Table of Contents
Fetching ...

Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym

Elena Raponi, Nathanael Rakotonirina Carraz, Jérémy Rapin, Carola Doerr, Olivier Teytaud

TL;DR

This work extends a 2013 comparison by evaluating Bayesian optimization and related ML-driven approaches against classical black-box optimization methods on both the BBOB/COCO suite and OpenAI Gym direct policy search under low budgets. It demonstrates that BO-based solvers excel with limited evaluations, though they incur higher computational costs, while traditional methods often outperform them as budgets grow. The study highlights cross-domain strengths, such as some BBO community algorithms performing well on ML tasks, and emphasizes the impact of problem scaling and domain bounds on search behavior. Overall, the results suggest a nuanced view: ML-driven optimization is valuable for quick, small-budget tasks, whereas classical heuristics and domain-specific RL solvers gain traction as budgets increase or problem dimensionality grows, with scaling and initialization playing crucial roles.

Abstract

The growing ubiquity of machine learning (ML) has led it to enter various areas of computer science, including black-box optimization (BBO). Recent research is particularly concerned with Bayesian optimization (BO). BO-based algorithms are popular in the ML community, as they are used for hyperparameter optimization and more generally for algorithm configuration. However, their efficiency decreases as the dimensionality of the problem and the budget of evaluations increase. Meanwhile, derivative-free optimization methods have evolved independently in the optimization community. Therefore, we urge to understand whether cross-fertilization is possible between the two communities, ML and BBO, i.e., whether algorithms that are heavily used in ML also work well in BBO and vice versa. Comparative experiments often involve rather small benchmarks and show visible problems in the experimental setup, such as poor initialization of baselines, overfitting due to problem-specific setting of hyperparameters, and low statistical significance. With this paper, we update and extend a comparative study presented by Hutter et al. in 2013. We compare BBO tools for ML with more classical heuristics, first on the well-known BBOB benchmark suite from the COCO environment and then on Direct Policy Search for OpenAI Gym, a reinforcement learning benchmark. Our results confirm that BO-based optimizers perform well on both benchmarks when budgets are limited, albeit with a higher computational cost, while they are often outperformed by algorithms from other families when the evaluation budget becomes larger. We also show that some algorithms from the BBO community perform surprisingly well on ML tasks.

Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym

TL;DR

This work extends a 2013 comparison by evaluating Bayesian optimization and related ML-driven approaches against classical black-box optimization methods on both the BBOB/COCO suite and OpenAI Gym direct policy search under low budgets. It demonstrates that BO-based solvers excel with limited evaluations, though they incur higher computational costs, while traditional methods often outperform them as budgets grow. The study highlights cross-domain strengths, such as some BBO community algorithms performing well on ML tasks, and emphasizes the impact of problem scaling and domain bounds on search behavior. Overall, the results suggest a nuanced view: ML-driven optimization is valuable for quick, small-budget tasks, whereas classical heuristics and domain-specific RL solvers gain traction as budgets increase or problem dimensionality grows, with scaling and initialization playing crucial roles.

Abstract

The growing ubiquity of machine learning (ML) has led it to enter various areas of computer science, including black-box optimization (BBO). Recent research is particularly concerned with Bayesian optimization (BO). BO-based algorithms are popular in the ML community, as they are used for hyperparameter optimization and more generally for algorithm configuration. However, their efficiency decreases as the dimensionality of the problem and the budget of evaluations increase. Meanwhile, derivative-free optimization methods have evolved independently in the optimization community. Therefore, we urge to understand whether cross-fertilization is possible between the two communities, ML and BBO, i.e., whether algorithms that are heavily used in ML also work well in BBO and vice versa. Comparative experiments often involve rather small benchmarks and show visible problems in the experimental setup, such as poor initialization of baselines, overfitting due to problem-specific setting of hyperparameters, and low statistical significance. With this paper, we update and extend a comparative study presented by Hutter et al. in 2013. We compare BBO tools for ML with more classical heuristics, first on the well-known BBOB benchmark suite from the COCO environment and then on Direct Policy Search for OpenAI Gym, a reinforcement learning benchmark. Our results confirm that BO-based optimizers perform well on both benchmarks when budgets are limited, albeit with a higher computational cost, while they are often outperformed by algorithms from other families when the evaluation budget becomes larger. We also show that some algorithms from the BBO community perform surprisingly well on ML tasks.
Paper Structure (18 sections, 6 figures, 3 tables)

This paper contains 18 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of the ECDF of runtimes on the BBOB functions f1--f24, using 51 targets uniformly spaced on a log scale between $1e-8$ and $100$. Plots are shown for dimension D equal to (a) 2, (b) 3, (c) 5, (d) 10, (e) 20, and (f) 40 and budget $10D$. X-axis: budget/dimension in log-scale. Y-axis: frequency of solving at the requested precision. Overall, Cobyla and NGOpt16 (which heavily relies on Cobyla) perform best in these examples.
  • Figure 2: As Fig. \ref{['b10d']} but with a $100D$ budget and only for dimensions (a) 2, (b) 3, (c) 5, and (d) 10. Some methods, which are too slow, are removed from the analysis. Overall, CMA algorithms and NGOpt16 perform best here.
  • Figure 3: Multi-deterministic Open AI Gym with tiny neural net: a random seed is randomly drawn for each optimization run so that overfitting is more difficult. See Fig. \ref{['det2']} for an aggregated view. The legend lists all the compared algorithms with two numbers in parentheses: These are the performance values for the last and second-to-last budgets on the x-axis, respectively. The first number is also used to sort the algorithms by performance.
  • Figure 4: Same results as reported in Fig. \ref{['det']}, but aggregated comparison as provided by Nevergrad nevergrad. Row A column B shows the frequency at which method A outperformed method B for the given budget. 9 distinct problems per budget. We include only problems for which the dimension is $D<50$. Methods are ranked per average winning rate; best methods are listed first. Note that winning rates are all very close to each other: only PSO is significantly better. Fig. \ref{['det3']} presents similar experiments but with bigger neural nets. Fig. 1 in the Supplementary Material extends the present results to budgets 800, 1600, and 3200.
  • Figure 5: Same as Fig. \ref{['det2']}, but with bigger nets (neural factor 3 in Nevergrad's benchmark scaling). 11 distinct problems per budget. We truncated at dimension $\leq 264$. Dimension ranges from 24 to 264 instead of 8 to 40 in Fig. \ref{['det2']}. Due to the computational cost, it was not possible to finish the runs for SMAC. Fig. 1 in the Supplementary Material extends the present results to budgets 800, 1600, and 3200.
  • ...and 1 more figures