Mind the Gap: Measuring Generalization Performance Across Multiple Objectives
Matthias Feurer, Katharina Eggensperger, Edward Bergman, Florian Pfisterer, Bernd Bischl, Frank Hutter
TL;DR
Mind the Gap addresses how to measure generalization of multi-objective hyperparameter optimization beyond the validation set. It introduces optimistic and pessimistic Pareto fronts, defined via test-set dominance $\prec_{test}$ on the validation-derived front, and uses the resulting hypervolume gap as a robustness metric. The paper formalizes the evaluation protocol, demonstrates the existence of the approximation gap in experiments, and shows that these notions enable reliable comparisons between two MHPO algorithms. This framework provides a practical tool for robust multi-objective model selection and generalization assessment across domains such as NAS and AutoML.
Abstract
Modern machine learning models are often constructed taking into account multiple objectives, e.g., minimizing inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models, and the approximation of the Pareto front is used to assess their performance. In practice, we also want to measure generalization when moving from the validation to the test set. However, some of the models might no longer be Pareto-optimal which makes it unclear how to quantify the performance of the MHPO method when evaluated on the test set. To resolve this, we provide a novel evaluation protocol that allows measuring the generalization performance of MHPO methods and studying its capabilities for comparing two optimization experiments.
