Table of Contents
Fetching ...

Algorithm Instance Footprint: Separating Easily Solvable and Challenging Problem Instances

Ana Nikolikj, Sašo Džeroski, Mario Andrés Muñoz, Carola Doerr, Peter Korošec, Tome Eftimov

TL;DR

In black-box optimization, the paper addresses the challenge of explaining why a given algorithm instance succeeds on some problem instances and fails on others. It introduces an algorithm instance footprint built from meta-representations that fuse landscape features with predicted performance via a supervised regression model, enriched by SHAP explanations, followed by deterministic clustering to locate regions of good and poor performance. The four-step pipeline—modeling, meta-representation extraction, clustering into four categories, and post-hoc analysis—is demonstrated on COCO/BBOB benchmarks at $D=10$ with a fixed budget of $500\cdot D$ evaluations and a ML tolerance of $p=0.15$, across multiple differential evolution configurations. This approach provides explainable, instance-level insights into algorithm behavior, enabling targeted improvements and offering a foundation for model-agnostic benchmarking and future multi-task analyses.

Abstract

In black-box optimization, it is essential to understand why an algorithm instance works on a set of problem instances while failing on others and provide explanations of its behavior. We propose a methodology for formulating an algorithm instance footprint that consists of a set of problem instances that are easy to be solved and a set of problem instances that are difficult to be solved, for an algorithm instance. This behavior of the algorithm instance is further linked to the landscape properties of the problem instances to provide explanations of which properties make some problem instances easy or challenging. The proposed methodology uses meta-representations that embed the landscape properties of the problem instances and the performance of the algorithm into the same vector space. These meta-representations are obtained by training a supervised machine learning regression model for algorithm performance prediction and applying model explainability techniques to assess the importance of the landscape features to the performance predictions. Next, deterministic clustering of the meta-representations demonstrates that using them captures algorithm performance across the space and detects regions of poor and good algorithm performance, together with an explanation of which landscape properties are leading to it.

Algorithm Instance Footprint: Separating Easily Solvable and Challenging Problem Instances

TL;DR

In black-box optimization, the paper addresses the challenge of explaining why a given algorithm instance succeeds on some problem instances and fails on others. It introduces an algorithm instance footprint built from meta-representations that fuse landscape features with predicted performance via a supervised regression model, enriched by SHAP explanations, followed by deterministic clustering to locate regions of good and poor performance. The four-step pipeline—modeling, meta-representation extraction, clustering into four categories, and post-hoc analysis—is demonstrated on COCO/BBOB benchmarks at with a fixed budget of evaluations and a ML tolerance of , across multiple differential evolution configurations. This approach provides explainable, instance-level insights into algorithm behavior, enabling targeted improvements and offering a foundation for model-agnostic benchmarking and future multi-task analyses.

Abstract

In black-box optimization, it is essential to understand why an algorithm instance works on a set of problem instances while failing on others and provide explanations of its behavior. We propose a methodology for formulating an algorithm instance footprint that consists of a set of problem instances that are easy to be solved and a set of problem instances that are difficult to be solved, for an algorithm instance. This behavior of the algorithm instance is further linked to the landscape properties of the problem instances to provide explanations of which properties make some problem instances easy or challenging. The proposed methodology uses meta-representations that embed the landscape properties of the problem instances and the performance of the algorithm into the same vector space. These meta-representations are obtained by training a supervised machine learning regression model for algorithm performance prediction and applying model explainability techniques to assess the importance of the landscape features to the performance predictions. Next, deterministic clustering of the meta-representations demonstrates that using them captures algorithm performance across the space and detects regions of poor and good algorithm performance, together with an explanation of which landscape properties are leading to it.
Paper Structure (5 sections, 6 figures, 2 tables)

This paper contains 5 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Flowchart of the methodology for calculating and analyzing algorithm instance footprint.
  • Figure 2: Box-plot showing the distribution of model performance over the test portion of the five folds: (a) MAE, (b) R2 score, for different feature portfolios of most important features as identified by the SHAP method, when predicting the performance of DE$1$.
  • Figure 3: 2D UMAP mcinnes2018umap visualization of the algorithm footprints obtained with the deterministic clustering, on the test portion of each of the five folds. The tolerance error for the RF model is within 15%. The blue color represents regions of good algorithm performance, and the yellow to regions of poor algorithm performance. The marker shape corresponds to good (O) and poor (X) ML model performance as indicated by the legend at the bottom of the plot.
  • Figure 4: The 10 most important ELA features and their prediction influence for the test instances of the first and second fold for the (good, good) and (poor, good) clusters. Each point on the plot is a Shapley value for a feature and an instance. Its position on the y-axis is determined by the feature and on the x-axis by the Shapley value. The color represents the value of the feature from low to high.
  • Figure 5: The distribution of two randomly selected (from the top 10) ELA features across the algorithm instance footprint. The color in the plots represents the normalized feature values.
  • ...and 1 more figures