Generalization Ability of Feature-based Performance Prediction Models: A Statistical Analysis across Benchmarks
Ana Nikolikj, Ana Kostovska, Gjorgjina Cenikj, Carola Doerr, Tome Eftimov
TL;DR
This work tackles the problem of generalizing feature-based performance predictors across benchmark suites by introducing a statistical framework that preserves high-dimensional information. It maps problem instances to a shared $n$-dimensional meta-feature space and uses the multivariate $\mathcal{E}$ test to compare training and testing distributions, linking cross-suite similarity to predictive transfer. Two experiments—one with standard BBOB/CEC suites and another with affine recombinations—show that when feature-landscape distributions are not statistically different, cross-suite predictive errors remain in the training error range, while significant differences forecast degraded accuracy. The study contributes a principled, information-preserving method for anticipating transferability of performance predictors and highlights the potential of combining this statistical view with traditional empirical coverage analyses to guide feature design and benchmark selection.
Abstract
This study examines the generalization ability of algorithm performance prediction models across various benchmark suites. Comparing the statistical similarity between the problem collections with the accuracy of performance prediction models that are based on exploratory landscape analysis features, we observe that there is a positive correlation between these two measures. Specifically, when the high-dimensional feature value distributions between training and testing suites lack statistical significance, the model tends to generalize well, in the sense that the testing errors are in the same range as the training errors. Two experiments validate these findings: one involving the standard benchmark suites, the BBOB and CEC collections, and another using five collections of affine combinations of BBOB problem instances.
