Table of Contents
Fetching ...

Multi-objective Hyperparameter Optimization in the Age of Deep Learning

Soham Basu, Frank Hutter, Danny Stoll

TL;DR

This work tackles multi-objective hyperparameter optimization for deep learning by introducing PriMO, the first MO-HPO algorithm that incorporates expert priors over multiple objectives and leverages cheap objective proxies. PriMO uses a MO-prior-augmented Bayesian acquisition with random scalarizations and an aggressive, MO-aware initial design to gain strong early progress, while retaining robustness against misleading priors. Empirically, PriMO achieves state-of-the-art performance in both MO and single-objective settings across eight DL benchmarks and shows resilience to prior strength through ablations and robustness studies. The approach offers practical, budget-efficient HPO for real-world DL tasks, enabling practitioners to navigate Pareto trade-offs more effectively under constrained compute budgets.

Abstract

While Deep Learning (DL) experts often have prior knowledge about which hyperparameter settings yield strong performance, only few Hyperparameter Optimization (HPO) algorithms can leverage such prior knowledge and none incorporate priors over multiple objectives. As DL practitioners often need to optimize not just one but many objectives, this is a blind spot in the algorithmic landscape of HPO. To address this shortcoming, we introduce PriMO, the first HPO algorithm that can integrate multi-objective user beliefs. We show PriMO achieves state-of-the-art performance across 8 DL benchmarks in the multi-objective and single-objective setting, clearly positioning itself as the new go-to HPO algorithm for DL practitioners.

Multi-objective Hyperparameter Optimization in the Age of Deep Learning

TL;DR

This work tackles multi-objective hyperparameter optimization for deep learning by introducing PriMO, the first MO-HPO algorithm that incorporates expert priors over multiple objectives and leverages cheap objective proxies. PriMO uses a MO-prior-augmented Bayesian acquisition with random scalarizations and an aggressive, MO-aware initial design to gain strong early progress, while retaining robustness against misleading priors. Empirically, PriMO achieves state-of-the-art performance in both MO and single-objective settings across eight DL benchmarks and shows resilience to prior strength through ablations and robustness studies. The approach offers practical, budget-efficient HPO for real-world DL tasks, enabling practitioners to navigate Pareto trade-offs more effectively under constrained compute budgets.

Abstract

While Deep Learning (DL) experts often have prior knowledge about which hyperparameter settings yield strong performance, only few Hyperparameter Optimization (HPO) algorithms can leverage such prior knowledge and none incorporate priors over multiple objectives. As DL practitioners often need to optimize not just one but many objectives, this is a blind spot in the algorithmic landscape of HPO. To address this shortcoming, we introduce PriMO, the first HPO algorithm that can integrate multi-objective user beliefs. We show PriMO achieves state-of-the-art performance across 8 DL benchmarks in the multi-objective and single-objective setting, clearly positioning itself as the new go-to HPO algorithm for DL practitioners.

Paper Structure

This paper contains 79 sections, 14 equations, 14 figures, 8 tables, 3 algorithms.

Figures (14)

  • Figure 1: Comparison of PriMO and prominent multi-objective algorithms. [Left] Mean relative ranks across 8 DL benchmarks under all prior conditions averaged. [Right] Mean dominated Hypervolume for tuning the hyperparameters of a language model, demonstrating that PriMO can leverage a good prior to offer speedups of up to $\sim$10x.
  • Figure 2: Mean relative ranks $\pm$ 1 standard error across benchmarks and seeds under various prior conditions for randomly sampling from the priors, MOASHA, and adaptations of it that utilize multi-objective expert priors. See Section \ref{['sec:exp']} for details on the evaluation protocol.
  • Figure 3: Mean relative ranks $\pm$ 1 standard error of PriMO and prominent multi-objective algorithms across benchmarks and seeds under various prior conditions.
  • Figure 4: Mean dominated Hypervolume $\pm$ 1 standard error of PriMO and prominent multi-objective algorithms across seeds for each benchmark. PriMO is under all good priors setting here. See Appendix \ref{['app:all_exp']} for additional Hypervolume plots.
  • Figure 5: Mean relative ranks $\pm$ 1 standard error of baselines we constructed to use multi-objective priors and PriMO across benchmarks and seeds under various prior conditions.
  • ...and 9 more figures