Large-scale Benchmarking of Metaphor-based Optimization Heuristics
Diederick Vermetten, Carola Doerr, Hao Wang, Anna V. Kononova, Thomas Bäck
TL;DR
This study addresses the challenge of evaluating a rapidly growing set of metaphor-based optimization heuristics by performing a large-scale benchmark of 294 implementations on the BBOB function suite across multiple dimensions. Using both anytime (AOCC) and fixed-budget (AOCClarge) performance metrics, the authors analyze how design choices, such as budget and baselines, influence comparisons and reveal substantial variability and complementarity within the algorithm portfolio. Key contributions include a comprehensive performance landscape, a Shapley-value assessment of algorithm contributions, and insights into how benchmarking data can guide algorithm selection, hybridization, and future research, all underscored by emphasis on reproducibility. The work highlights that many methods do not outperform random search on easy problems and demonstrates practical tools and methodologies to systematize fair, data-driven evaluation of optimization heuristics.
Abstract
The number of proposed iterative optimization heuristics is growing steadily, and with this growth, there have been many points of discussion within the wider community. One particular criticism that is raised towards many new algorithms is their focus on metaphors used to present the method, rather than emphasizing their potential algorithmic contributions. Several studies into popular metaphor-based algorithms have highlighted these problems, even showcasing algorithms that are functionally equivalent to older existing methods. Unfortunately, this detailed approach is not scalable to the whole set of metaphor-based algorithms. Because of this, we investigate ways in which benchmarking can shed light on these algorithms. To this end, we run a set of 294 algorithm implementations on the BBOB function suite. We investigate how the choice of the budget, the performance measure, or other aspects of experimental design impact the comparison of these algorithms. Our results emphasize why benchmarking is a key step in expanding our understanding of the algorithm space, and what challenges still need to be overcome to fully gauge the potential improvements to the state-of-the-art hiding behind the metaphors.
