Benchmarking that Matters: Rethinking Benchmarking for Practical Impact
Anna V. Kononova, Niki van Stein, Olaf Mersmann, Thomas Bäck, Thomas Bartz-Beielstein, Tobias Glasmachers, Michael Hellwig, Sebastian Krey, Jakub Kůdela, Boris Naujoks, Leonard Papenmeier, Elena Raponi, Quentin Renau, Jeroen Rook, Lennart Schäpermeier, Diederick Vermetten, Daniela Zaharie
TL;DR
The paper argues that benchmarking for continuous and mixed-integer optimization is not yet aligned with real-world needs, highlighting the gap between academically oriented, synthetic testbeds and industry requirements. It proposes a vision for real-world-inspired benchmarks (RWI), supported by a taxonomy of high-level problem features, curated problem collections, and community-driven tooling and data repositories that enable trustworthy, decision-focused solver selection. Key contributions include a framework for transversal benchmarking that uses feature vectors and distance measures to match real-world problems with appropriate benchmarks and algorithms, and a blueprint for an ecosystem of modular tooling, data validation, and living performance databases. The work aims to narrow the gap between theory and practice by creating an impact-oriented benchmarking culture where industry feedback continuously informs benchmark design and academic research.”
Abstract
Benchmarking has driven scientific progress in Evolutionary Computation, yet current practices fall short of real-world needs. Widely used synthetic suites such as BBOB and CEC isolate algorithmic phenomena but poorly reflect the structure, constraints, and information limitations of continuous and mixed-integer optimization problems in practice. This disconnect leads to the misuse of benchmarking suites for competitions, automated algorithm selection, and industrial decision-making, despite these suites being designed for different purposes. We identify key gaps in current benchmarking practices and tooling, including limited availability of real-world-inspired problems, missing high-level features, and challenges in multi-objective and noisy settings. We propose a vision centered on curated real-world-inspired benchmarks, practitioner-accessible feature spaces and community-maintained performance databases. Real progress requires coordinated effort: A living benchmarking ecosystem that evolves with real-world insights and supports both scientific understanding and industrial use.
