Table of Contents
Fetching ...

A Trajectory-Based Bayesian Approach to Multi-Objective Hyperparameter Optimization with Epoch-Aware Trade-Offs

Wenyu Wang, Zheyi Fan, Szu Hui Ng

TL;DR

This work introduces Enhanced MOHPO (EMOHPO), which treats training epochs as a decision variable alongside hyperparameters to reveal trajectory-based trade-offs during iterative learning. It develops Trajectory-based MOBO (TMOBO) with a trajectory-aware acquisition (TEHVI) and a conservative early-stopping mechanism, enabling efficient, epoch-aware multi-objective optimization. The approach is validated on synthetic and real ML benchmarks, demonstrating superior Pareto front discovery and reduced training cost through trajectory exploitation and selective augmentation. The results suggest significant practical impact for hyperparameter tuning and model retraining scenarios where early performance signals reveal critical trade-offs, including overfitting mitigation and efficient deployment strategies. Overall, EMOHPO and TMOBO advance axis-aligned optimization in settings where learning dynamics across epochs carry essential information about multiple competing objectives.

Abstract

Training machine learning models inherently involves a resource-intensive and noisy iterative learning procedure that allows epoch-wise monitoring of the model performance. However, the insights gained from the iterative learning procedure typically remain underutilized in multi-objective hyperparameter optimization scenarios. Despite the limited research in this area, existing methods commonly identify the trade-offs only at the end of model training, overlooking the fact that trade-offs can emerge at earlier epochs in cases such as overfitting. To bridge this gap, we propose an enhanced multi-objective hyperparameter optimization problem that treats the number of training epochs as a decision variable, rather than merely an auxiliary parameter, to account for trade-offs at an earlier training stage. To solve this problem and accommodate its iterative learning, we then present a trajectory-based multi-objective Bayesian optimization algorithm characterized by two features: 1) a novel acquisition function that captures the improvement along the predictive trajectory of model performances over epochs for any hyperparameter setting and 2) a multi-objective early stopping mechanism that determines when to terminate the training to maximize epoch efficiency. Experiments on synthetic simulations and hyperparameter tuning benchmarks demonstrate that our algorithm can effectively identify the desirable trade-offs while improving tuning efficiency.

A Trajectory-Based Bayesian Approach to Multi-Objective Hyperparameter Optimization with Epoch-Aware Trade-Offs

TL;DR

This work introduces Enhanced MOHPO (EMOHPO), which treats training epochs as a decision variable alongside hyperparameters to reveal trajectory-based trade-offs during iterative learning. It develops Trajectory-based MOBO (TMOBO) with a trajectory-aware acquisition (TEHVI) and a conservative early-stopping mechanism, enabling efficient, epoch-aware multi-objective optimization. The approach is validated on synthetic and real ML benchmarks, demonstrating superior Pareto front discovery and reduced training cost through trajectory exploitation and selective augmentation. The results suggest significant practical impact for hyperparameter tuning and model retraining scenarios where early performance signals reveal critical trade-offs, including overfitting mitigation and efficient deployment strategies. Overall, EMOHPO and TMOBO advance axis-aligned optimization in settings where learning dynamics across epochs carry essential information about multiple competing objectives.

Abstract

Training machine learning models inherently involves a resource-intensive and noisy iterative learning procedure that allows epoch-wise monitoring of the model performance. However, the insights gained from the iterative learning procedure typically remain underutilized in multi-objective hyperparameter optimization scenarios. Despite the limited research in this area, existing methods commonly identify the trade-offs only at the end of model training, overlooking the fact that trade-offs can emerge at earlier epochs in cases such as overfitting. To bridge this gap, we propose an enhanced multi-objective hyperparameter optimization problem that treats the number of training epochs as a decision variable, rather than merely an auxiliary parameter, to account for trade-offs at an earlier training stage. To solve this problem and accommodate its iterative learning, we then present a trajectory-based multi-objective Bayesian optimization algorithm characterized by two features: 1) a novel acquisition function that captures the improvement along the predictive trajectory of model performances over epochs for any hyperparameter setting and 2) a multi-objective early stopping mechanism that determines when to terminate the training to maximize epoch efficiency. Experiments on synthetic simulations and hyperparameter tuning benchmarks demonstrate that our algorithm can effectively identify the desirable trade-offs while improving tuning efficiency.
Paper Structure (30 sections, 2 theorems, 25 equations, 14 figures, 2 tables, 2 algorithms)

This paper contains 30 sections, 2 theorems, 25 equations, 14 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

Let $X_{Trj}^*$ denote the set of hyperparameter settings that belong to the Pareto-optimal set of EMOHPO, i.e., $X_{Trj}^* := \{\bm{x} \in \mathbb{X} \mid \exists t \in \mathbb{T}, \nexists (\bm{x}', t') \in \mathbb{X} \times \mathbb{T}, \bm{f}(\bm{x}', t') \prec \bm{f}(\bm{x}, t)\}$. Then, the Par

Figures (14)

  • Figure 1: (A) and (B): Learning curves of three hyperparameter settings $\bm{x}_1$, $\bm{x}_2$, and $\bm{x}_3 \in \mathbb{R}^d$; (C): Trajectories of $\bm{x}_1$, $\bm{x}_2$, and $\bm{x}_3$ and trade-offs over their model performances; (D) Trajectory-based improvement and early stopping when a new $\bm{x}' \in \mathbb{R}^d$ is sampled. $L(\bm{x}, t)$ (or $C(\bm{x}, t)$) denotes the validation loss (or cost) of training with $\bm{x} \in \mathbb{R}^d$ for $t$ epochs. The maximum number of epochs is 10.
  • Figure 2: Box plots for 20 problems derived from ZDT1, ZDT2, DTLZ1, DTLZ2, and DTLZ7. Each algorithm runs for 20 independent trials. The logarithm of Hypervolume difference is computed at the end of each trial and is normalized in $[0, 1]$.
  • Figure 3: Average log Hypervolume difference against time for each algorithm on five different hyperparameter tuning tasks. Each algorithm runs for 20 independent trials. The shaded region indicates two standard errors of the mean.
  • Figure 4: Average log HV difference against wall-clock time for each algorithm on the hyperparameter tuning task of MobileNetV2 on the CIFAR-10 dataset. The shaded region indicates two standard errors of the mean.
  • Figure B.1: An illustrative example of GP prediction obtained after running TMOBO on the kc1 hyperparameter tuning task for 35 iterations. [Left] GP prediction (green curve) of validation loss for the selected hyperparameter setting over epochs 1 to 50 before any true observations (orange stars) from its learning curve are known. [Right] Updated GP predictions after some observations (red stars) on the learning curve are revealed and incorporated into the GP model.
  • ...and 9 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Lemma 1
  • Lemma 2