Table of Contents
Fetching ...

On the Hyperparameter Loss Landscapes of Machine Learning Models: An Exploratory Study

Mingyu Huang, Ke Li

TL;DR

This work addresses the gap in understanding the topography of hyperparameter loss landscapes by mapping and analyzing over 1,500 HP loss landscapes across 6 representative ML models and 63 datasets using a bespoke fitness landscape analysis framework. The approach combines a graph-based landscape representation with six topology metrics and three cross-landscape similarity measures, augmented by HOPE-based embeddings and UMAP visualizations, to reveal consistent structural patterns while quantifying transferability across fidelities, datasets, and models. Key findings show that HP loss landscapes are generally smooth, locally clustered, and highly neutral, with low-fidelity landscapes closely mirroring full-fidelity ones and training/test landscapes largely aligned for several models; NAS benchmarks, however, exhibit greater multimodality and more local optima. These results support the use of transfer and multi-fidelity strategies in HPO, inform search-space design, and offer a scalable lens for broader AutoML and NAS research, while outlining limitations and avenues for extending the framework to more diverse model families and tasks.

Abstract

Previous efforts on hyperparameter optimization (HPO) of machine learning (ML) models predominately focus on algorithmic advances, yet little is known about the topography of the underlying hyperparameter (HP) loss landscape, which plays a fundamental role in governing the search process of HPO. While several works have conducted fitness landscape analysis (FLA) on various ML systems, they are limited to properties of isolated landscape without interrogating the potential structural similarities among them. The exploration of such similarities can provide a novel perspective for understanding the mechanism behind modern HPO methods, but has been missing, possibly due to the expensive cost of large-scale landscape construction, and the lack of effective analysis methods. In this paper, we mapped 1,500 HP loss landscapes of 6 representative ML models on 63 datasets across different fidelity levels, with 11M+ configurations. By conducting exploratory analysis on these landscapes with fine-grained visualizations and dedicated FLA metrics, we observed a similar landscape topography across a wide range of models, datasets, and fidelities, and shed light on several central topics in HPO.

On the Hyperparameter Loss Landscapes of Machine Learning Models: An Exploratory Study

TL;DR

This work addresses the gap in understanding the topography of hyperparameter loss landscapes by mapping and analyzing over 1,500 HP loss landscapes across 6 representative ML models and 63 datasets using a bespoke fitness landscape analysis framework. The approach combines a graph-based landscape representation with six topology metrics and three cross-landscape similarity measures, augmented by HOPE-based embeddings and UMAP visualizations, to reveal consistent structural patterns while quantifying transferability across fidelities, datasets, and models. Key findings show that HP loss landscapes are generally smooth, locally clustered, and highly neutral, with low-fidelity landscapes closely mirroring full-fidelity ones and training/test landscapes largely aligned for several models; NAS benchmarks, however, exhibit greater multimodality and more local optima. These results support the use of transfer and multi-fidelity strategies in HPO, inform search-space design, and offer a scalable lens for broader AutoML and NAS research, while outlining limitations and avenues for extending the framework to more diverse model families and tasks.

Abstract

Previous efforts on hyperparameter optimization (HPO) of machine learning (ML) models predominately focus on algorithmic advances, yet little is known about the topography of the underlying hyperparameter (HP) loss landscape, which plays a fundamental role in governing the search process of HPO. While several works have conducted fitness landscape analysis (FLA) on various ML systems, they are limited to properties of isolated landscape without interrogating the potential structural similarities among them. The exploration of such similarities can provide a novel perspective for understanding the mechanism behind modern HPO methods, but has been missing, possibly due to the expensive cost of large-scale landscape construction, and the lack of effective analysis methods. In this paper, we mapped 1,500 HP loss landscapes of 6 representative ML models on 63 datasets across different fidelity levels, with 11M+ configurations. By conducting exploratory analysis on these landscapes with fine-grained visualizations and dedicated FLA metrics, we observed a similar landscape topography across a wide range of models, datasets, and fidelities, and shed light on several central topics in HPO.
Paper Structure (27 sections, 6 equations, 15 figures, 12 tables, 3 algorithms)

This paper contains 27 sections, 6 equations, 15 figures, 12 tables, 3 algorithms.

Figures (15)

  • Figure 1: Fitness landscape of an illustrative $2$D problem.
  • Figure 2: $2$D visualization of HP loss landscapes using our proposed method in sec:methodology for (A)CNN and (B)XGBoost, w.r.t. $\mathcal{L}_{\mathrm{test}}$ and $\mathcal{L}_{\mathrm{train}}$ as well as different datasets and fidelities (by varying the fraction of training data $\alpha$ or the number of training epochs $E$, marked in red color). (C) shows the visualization of the NASBench-101 landscape with different training epochs on CIFAR-10, and (D) visualizes the NASBench-201 (❶--❸) and NASBench-360 (❹--❺) loss landscapes of different datasets. Colors indicate ranks of performance (lower rank values are better) to normalize the losses.
  • Figure 3: Distribution of FLA metrics across all datasets for landscapes of $1)$$\mathcal{L}_\mathrm{test}$, $2)$$\mathcal{L}_\mathrm{train}$ and $3)$$\mathcal{L}_{\mathrm{test}}$ obtained with $10\%$ training data.
  • Figure 4: Distribution of Spearman, Shake-up and $\gamma$-set metrics between (A)$\mathcal{L}_\mathrm{test}$ and $\mathcal{L}_\mathrm{train}$, (B)$\mathcal{L}_\mathrm{test}$ and $\mathcal{L}_{\mathrm{test}}$ with $10\%$ training data, (C)$\mathcal{L}_\mathrm{test}$ across datasets.
  • Figure 5: (A) Scatter plot of $\mathcal{L}_\mathrm{train}$ versus $\mathcal{L}_\mathrm{test}$ of XGBoost on the dataset #44059. $\bm{\lambda}^*_{\mathrm{test}}$ is marked by ☆, (B-E) The same plot with colors to indicate HPs values. Warmer color indicate higher values.
  • ...and 10 more figures