Impact of Training Instance Selection on Automated Algorithm Selection Models for Numerical Black-box Optimization
Konstantin Dietrich, Diederick Vermetten, Carola Doerr, Pascal Kerschke
TL;DR
This work evaluates how training instance selection affects automated algorithm selection for numerical black-box optimization using the MA-BBOB benchmark, generating $11{,}800$ functions per dimension in $d\in\{2,5\}$ and analyzing eight solvers with 74 ELA features. The study shows that training on BBOB component functions yields poor generalization and that the match between training and test distributions dominates AAS performance, though this can be mitigated by larger training sets. A three-fold training-set strategy (uniform, greedy diversity, and MA-BBOB components) coupled with an eight-solver portfolio is examined, revealing that distribution-shape and sampling strategy critically influence the learned selector’s ability to close the $\text{VBS}-\text{SBS}$ gap on unseen data. The results highlight practical considerations for deploying AAS tools in real-world optimization, including the trade-off between data collection cost and predictive accuracy, and suggest directions for heterogeneous training data strategies and broader benchmarking beyond academic suites.
Abstract
The recently proposed MA-BBOB function generator provides a way to create numerical black-box benchmark problems based on the well-established BBOB suite. Initial studies on this generator highlighted its ability to smoothly transition between the component functions, both from a low-level landscape feature perspective, as well as with regard to algorithm performance. This suggests that MA-BBOB-generated functions can be an ideal testbed for automated machine learning methods, such as automated algorithm selection (AAS). In this paper, we generate 11800 functions in dimensions $d=2$ and $d=5$, respectively, and analyze the potential gains from AAS by studying performance complementarity within a set of eight algorithms. We combine this performance data with exploratory landscape features to create an AAS pipeline that we use to investigate how to efficiently select training sets within this space. We show that simply using the BBOB component functions for training yields poor test performance, while the ranking between uniformly chosen and diversity-based training sets strongly depends on the distribution of the test set.
