Impact of Training Instance Selection on Automated Algorithm Selection Models for Numerical Black-box Optimization

Konstantin Dietrich; Diederick Vermetten; Carola Doerr; Pascal Kerschke

Impact of Training Instance Selection on Automated Algorithm Selection Models for Numerical Black-box Optimization

Konstantin Dietrich, Diederick Vermetten, Carola Doerr, Pascal Kerschke

TL;DR

This work evaluates how training instance selection affects automated algorithm selection for numerical black-box optimization using the MA-BBOB benchmark, generating $11{,}800$ functions per dimension in $d\in\{2,5\}$ and analyzing eight solvers with 74 ELA features. The study shows that training on BBOB component functions yields poor generalization and that the match between training and test distributions dominates AAS performance, though this can be mitigated by larger training sets. A three-fold training-set strategy (uniform, greedy diversity, and MA-BBOB components) coupled with an eight-solver portfolio is examined, revealing that distribution-shape and sampling strategy critically influence the learned selector’s ability to close the $\text{VBS}-\text{SBS}$ gap on unseen data. The results highlight practical considerations for deploying AAS tools in real-world optimization, including the trade-off between data collection cost and predictive accuracy, and suggest directions for heterogeneous training data strategies and broader benchmarking beyond academic suites.

Abstract

The recently proposed MA-BBOB function generator provides a way to create numerical black-box benchmark problems based on the well-established BBOB suite. Initial studies on this generator highlighted its ability to smoothly transition between the component functions, both from a low-level landscape feature perspective, as well as with regard to algorithm performance. This suggests that MA-BBOB-generated functions can be an ideal testbed for automated machine learning methods, such as automated algorithm selection (AAS). In this paper, we generate 11800 functions in dimensions $d=2$ and $d=5$, respectively, and analyze the potential gains from AAS by studying performance complementarity within a set of eight algorithms. We combine this performance data with exploratory landscape features to create an AAS pipeline that we use to investigate how to efficiently select training sets within this space. We show that simply using the BBOB component functions for training yields poor test performance, while the ranking between uniformly chosen and diversity-based training sets strongly depends on the distribution of the test set.

Impact of Training Instance Selection on Automated Algorithm Selection Models for Numerical Black-box Optimization

TL;DR

This work evaluates how training instance selection affects automated algorithm selection for numerical black-box optimization using the MA-BBOB benchmark, generating

functions per dimension in

and analyzing eight solvers with 74 ELA features. The study shows that training on BBOB component functions yields poor generalization and that the match between training and test distributions dominates AAS performance, though this can be mitigated by larger training sets. A three-fold training-set strategy (uniform, greedy diversity, and MA-BBOB components) coupled with an eight-solver portfolio is examined, revealing that distribution-shape and sampling strategy critically influence the learned selector’s ability to close the

gap on unseen data. The results highlight practical considerations for deploying AAS tools in real-world optimization, including the trade-off between data collection cost and predictive accuracy, and suggest directions for heterogeneous training data strategies and broader benchmarking beyond academic suites.

Abstract

and

, respectively, and analyze the potential gains from AAS by studying performance complementarity within a set of eight algorithms. We combine this performance data with exploratory landscape features to create an AAS pipeline that we use to investigate how to efficiently select training sets within this space. We show that simply using the BBOB component functions for training yields poor test performance, while the ranking between uniformly chosen and diversity-based training sets strongly depends on the distribution of the test set.

Paper Structure (16 sections, 8 figures, 1 table)

This paper contains 16 sections, 8 figures, 1 table.

Introduction
The MA-BBOB Generator
Experimental Setup
Function Generation
ELA calculation and selection
Training instance selection for AAS
Performance Data Collection
Reproducibility
Results
Complementarity analysis
Problem properties
Solver performance
Automated Algorithm Selection
Impact of sampling strategy
Impact of training data set size
...and 1 more sections

Figures (8)

Figure 1: Boxplots of the selected ELA feature values of the two-dimensional BBOB function set (blue), and the two-dimensional function subsets with 2 (orange) and 24 (green) active component functions, respectively. Generally, the value ranges of the subgroups do not appear to be extremely different from each other. Instances with 24 active component functions become very similar which is resembled by the narrowed feature value ranges.
Figure 2: PCA projection of the ELA feature vectors of the 11 800 MA-BBOB functions (small dots) and the 120 BBOB instances (larger dots), for 2$d$ (top) and 5$d$ (bottom). Projections are fitted using the BBOB functions. The colorbar on the bottom corresponds to the BBOB function ID, whereas the one on the top is used to display how many functions were combined to create the respective MA-BBOB function. The MA-BBOB functions fill the feature space spanned by the BBOB functions but mostly remain within its convex hull.
Figure 3: Visualization of the selected two-dimensional instance sets, using the same PCA projection as in Fig. \ref{['fig:pca_all']}. The left column shows five of the uniformly sampled sets of size 120 and the right column shows five of the greedily sampled sets. Colored points are BBOB functions, using the colorbar from the bottom of Fig. \ref{['fig:pca_all']}. As expected, the greedily sampled function sets are more evenly spread in feature space while the uniform randomly sampled sets are more likely to occupy the area with the highest function density.
Figure 4: AOCC of the SBS on the $x$-axis vs. AOCC of the other algorithms on the $y$-axis. Each blue dot corresponds to one of the 11 800 functions in 2$d$. BBOB functions are colored according to the colorbar in Fig. \ref{['fig:pca_all']}. All points above the diagonals correspond to instances for which the respective solver beats the SBS. The fraction of instances on which at least one algorithm outperforms the SBS is 0.46.
Figure 5: Potential of algorithm selection to improve over the SBS, measured in average AOCC improvement of the VBS over the SBS, for all portfolios using at least 3 of the 8 algorithms for which we have collected performance data. Data points are grouped by SBS (columns) and by portfolio size (symbol). The modCMA algorithm is very dominant in our portfolio, leaving little room for algorithm selection. When removed from the portfolio, the average VBS-SBS gap is 0.111, the second largest value obtained across subsets.
...and 3 more figures

Impact of Training Instance Selection on Automated Algorithm Selection Models for Numerical Black-box Optimization

TL;DR

Abstract

Impact of Training Instance Selection on Automated Algorithm Selection Models for Numerical Black-box Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (8)