Algorithm Instance Footprint: Separating Easily Solvable and Challenging Problem Instances
Ana Nikolikj, Sašo Džeroski, Mario Andrés Muñoz, Carola Doerr, Peter Korošec, Tome Eftimov
TL;DR
In black-box optimization, the paper addresses the challenge of explaining why a given algorithm instance succeeds on some problem instances and fails on others. It introduces an algorithm instance footprint built from meta-representations that fuse landscape features with predicted performance via a supervised regression model, enriched by SHAP explanations, followed by deterministic clustering to locate regions of good and poor performance. The four-step pipeline—modeling, meta-representation extraction, clustering into four categories, and post-hoc analysis—is demonstrated on COCO/BBOB benchmarks at $D=10$ with a fixed budget of $500\cdot D$ evaluations and a ML tolerance of $p=0.15$, across multiple differential evolution configurations. This approach provides explainable, instance-level insights into algorithm behavior, enabling targeted improvements and offering a foundation for model-agnostic benchmarking and future multi-task analyses.
Abstract
In black-box optimization, it is essential to understand why an algorithm instance works on a set of problem instances while failing on others and provide explanations of its behavior. We propose a methodology for formulating an algorithm instance footprint that consists of a set of problem instances that are easy to be solved and a set of problem instances that are difficult to be solved, for an algorithm instance. This behavior of the algorithm instance is further linked to the landscape properties of the problem instances to provide explanations of which properties make some problem instances easy or challenging. The proposed methodology uses meta-representations that embed the landscape properties of the problem instances and the performance of the algorithm into the same vector space. These meta-representations are obtained by training a supervised machine learning regression model for algorithm performance prediction and applying model explainability techniques to assess the importance of the landscape features to the performance predictions. Next, deterministic clustering of the meta-representations demonstrates that using them captures algorithm performance across the space and detects regions of poor and good algorithm performance, together with an explanation of which landscape properties are leading to it.
