Time to Stop and Think: What kind of research do we want to do?
Josu Ceberio, Borja Calvo
TL;DR
This paper addresses the quality of experimental practice in metaheuristic optimization, arguing that current work often follows inertial traditions that undermine validity. It distinguishes engineering, problem-focused research from scientific, knowledge-seeking inquiry, and discusses how choices of benchmarks, instance generators, and data analysis shape conclusions. The authors advocate accumulating broad observational data, avoiding cherry-picked instances, and using experiments to generate hypotheses about algorithm behavior rather than simply chasing state-of-the-art on artificial benchmarks. By promoting either real-world relevance or explanatory understanding, the paper aims to improve reproducibility, interpretation, and the long-term progress of metaheuristic knowledge.
Abstract
Experimentation is an intrinsic part of research in artificial intelligence since it allows for collecting quantitative observations, validating hypotheses, and providing evidence for their reformulation. For that reason, experimentation must be coherent with the purposes of the research, properly addressing the relevant questions in each case. Unfortunately, the literature is full of works whose experimentation is neither rigorous nor convincing, oftentimes designed to support prior beliefs rather than answering the relevant research questions. In this paper, we focus on the field of metaheuristic optimization, since it is our main field of work, and it is where we have observed the misconduct that has motivated this letter. Even if we limit the focus of this manuscript to the experimental part of the research, our main goal is to sew the seed of sincere critical assessment of our work, sparking a reflection process both at the individual and the community level. Such a reflection process is too complex and extensive to be tackled as a whole. Therefore, to bring our feet to the ground, we will include in this document our reflections about the role of experimentation in our work, discussing topics such as the use of benchmark instances vs instance generators, or the statistical assessment of empirical results. That is, all the statements included in this document are personal views and opinions, which can be shared by others or not. Certainly, having different points of view is the basis to establish a good discussion process.
