Table of Contents
Fetching ...

Benchmarking that Matters: Rethinking Benchmarking for Practical Impact

Anna V. Kononova, Niki van Stein, Olaf Mersmann, Thomas Bäck, Thomas Bartz-Beielstein, Tobias Glasmachers, Michael Hellwig, Sebastian Krey, Jakub Kůdela, Boris Naujoks, Leonard Papenmeier, Elena Raponi, Quentin Renau, Jeroen Rook, Lennart Schäpermeier, Diederick Vermetten, Daniela Zaharie

TL;DR

The paper argues that benchmarking for continuous and mixed-integer optimization is not yet aligned with real-world needs, highlighting the gap between academically oriented, synthetic testbeds and industry requirements. It proposes a vision for real-world-inspired benchmarks (RWI), supported by a taxonomy of high-level problem features, curated problem collections, and community-driven tooling and data repositories that enable trustworthy, decision-focused solver selection. Key contributions include a framework for transversal benchmarking that uses feature vectors and distance measures to match real-world problems with appropriate benchmarks and algorithms, and a blueprint for an ecosystem of modular tooling, data validation, and living performance databases. The work aims to narrow the gap between theory and practice by creating an impact-oriented benchmarking culture where industry feedback continuously informs benchmark design and academic research.”

Abstract

Benchmarking has driven scientific progress in Evolutionary Computation, yet current practices fall short of real-world needs. Widely used synthetic suites such as BBOB and CEC isolate algorithmic phenomena but poorly reflect the structure, constraints, and information limitations of continuous and mixed-integer optimization problems in practice. This disconnect leads to the misuse of benchmarking suites for competitions, automated algorithm selection, and industrial decision-making, despite these suites being designed for different purposes. We identify key gaps in current benchmarking practices and tooling, including limited availability of real-world-inspired problems, missing high-level features, and challenges in multi-objective and noisy settings. We propose a vision centered on curated real-world-inspired benchmarks, practitioner-accessible feature spaces and community-maintained performance databases. Real progress requires coordinated effort: A living benchmarking ecosystem that evolves with real-world insights and supports both scientific understanding and industrial use.

Benchmarking that Matters: Rethinking Benchmarking for Practical Impact

TL;DR

The paper argues that benchmarking for continuous and mixed-integer optimization is not yet aligned with real-world needs, highlighting the gap between academically oriented, synthetic testbeds and industry requirements. It proposes a vision for real-world-inspired benchmarks (RWI), supported by a taxonomy of high-level problem features, curated problem collections, and community-driven tooling and data repositories that enable trustworthy, decision-focused solver selection. Key contributions include a framework for transversal benchmarking that uses feature vectors and distance measures to match real-world problems with appropriate benchmarks and algorithms, and a blueprint for an ecosystem of modular tooling, data validation, and living performance databases. The work aims to narrow the gap between theory and practice by creating an impact-oriented benchmarking culture where industry feedback continuously informs benchmark design and academic research.”

Abstract

Benchmarking has driven scientific progress in Evolutionary Computation, yet current practices fall short of real-world needs. Widely used synthetic suites such as BBOB and CEC isolate algorithmic phenomena but poorly reflect the structure, constraints, and information limitations of continuous and mixed-integer optimization problems in practice. This disconnect leads to the misuse of benchmarking suites for competitions, automated algorithm selection, and industrial decision-making, despite these suites being designed for different purposes. We identify key gaps in current benchmarking practices and tooling, including limited availability of real-world-inspired problems, missing high-level features, and challenges in multi-objective and noisy settings. We propose a vision centered on curated real-world-inspired benchmarks, practitioner-accessible feature spaces and community-maintained performance databases. Real progress requires coordinated effort: A living benchmarking ecosystem that evolves with real-world insights and supports both scientific understanding and industrial use.

Paper Structure

This paper contains 20 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Logic flowchart illustrating two worlds of benchmarking: Academia focuses on understanding and comparing algorithmic performance via RWI benchmarks, while industry aims to select effective algorithms with minimal resources using offline databases. Dashed arrows denote feedback loops where performance data fine-tune and validate benchmarks.