Table of Contents
Fetching ...

Hyperparameter Tuning Through Pessimistic Bilevel Optimization

Meltem Apaydin Ustun, Liang Xu, Bo Zeng, Xiaoning Qian

TL;DR

This work introduces pessimistic bilevel optimization (PBL) for hyperparameter tuning to address inner-level non-uniqueness and model uncertainty. It defines a three-level PBL with a relaxation-based relaxation strategy that yields robust hyperparameters by optimizing the outer objective against the worst-case inner solutions, and it extends this framework with an $oldsymbol{\varepsilon}$-approximation to hedge against training-time suboptimalities. The authors provide both theoretical reformulations and a practical solution pathway, including an optimistic baseline and a pessimistic, KKT-based single-level reformulation with label flipping to handle non-convexities. Empirically, pessimistic tuning outperforms optimistic tuning in small-data regimes and under perturbed or distribution-shifted test data, demonstrated on UCI datasets and transfer-learning-style MNIST/FashionMNIST experiments, highlighting its potential for robust AutoML in few-shot and adversarial settings.

Abstract

Automated hyperparameter search in machine learning, especially for deep learning models, is typically formulated as a bilevel optimization problem, with hyperparameter values determined by the upper level and the model learning achieved by the lower-level problem. Most of the existing bilevel optimization solutions either assume the uniqueness of the optimal training model given hyperparameters or adopt an optimistic view when the non-uniqueness issue emerges. Potential model uncertainty may arise when training complex models with limited data, especially when the uniqueness assumption is violated. Thus, the suitability of the optimistic view underlying current bilevel hyperparameter optimization solutions is questionable. In this paper, we propose pessimistic bilevel hyperparameter optimization to assure appropriate outer-level hyperparameters to better generalize the inner-level learned models, by explicitly incorporating potential uncertainty of the inner-level solution set. To solve the resulting computationally challenging pessimistic bilevel optimization problem, we develop a novel relaxation-based approximation method. It derives pessimistic solutions with more robust prediction models. In our empirical studies of automated hyperparameter search for binary linear classifiers, pessimistic solutions have demonstrated better prediction performances than optimistic counterparts when we have limited training data or perturbed testing data, showing the necessity of considering pessimistic solutions besides existing optimistic ones.

Hyperparameter Tuning Through Pessimistic Bilevel Optimization

TL;DR

This work introduces pessimistic bilevel optimization (PBL) for hyperparameter tuning to address inner-level non-uniqueness and model uncertainty. It defines a three-level PBL with a relaxation-based relaxation strategy that yields robust hyperparameters by optimizing the outer objective against the worst-case inner solutions, and it extends this framework with an -approximation to hedge against training-time suboptimalities. The authors provide both theoretical reformulations and a practical solution pathway, including an optimistic baseline and a pessimistic, KKT-based single-level reformulation with label flipping to handle non-convexities. Empirically, pessimistic tuning outperforms optimistic tuning in small-data regimes and under perturbed or distribution-shifted test data, demonstrated on UCI datasets and transfer-learning-style MNIST/FashionMNIST experiments, highlighting its potential for robust AutoML in few-shot and adversarial settings.

Abstract

Automated hyperparameter search in machine learning, especially for deep learning models, is typically formulated as a bilevel optimization problem, with hyperparameter values determined by the upper level and the model learning achieved by the lower-level problem. Most of the existing bilevel optimization solutions either assume the uniqueness of the optimal training model given hyperparameters or adopt an optimistic view when the non-uniqueness issue emerges. Potential model uncertainty may arise when training complex models with limited data, especially when the uniqueness assumption is violated. Thus, the suitability of the optimistic view underlying current bilevel hyperparameter optimization solutions is questionable. In this paper, we propose pessimistic bilevel hyperparameter optimization to assure appropriate outer-level hyperparameters to better generalize the inner-level learned models, by explicitly incorporating potential uncertainty of the inner-level solution set. To solve the resulting computationally challenging pessimistic bilevel optimization problem, we develop a novel relaxation-based approximation method. It derives pessimistic solutions with more robust prediction models. In our empirical studies of automated hyperparameter search for binary linear classifiers, pessimistic solutions have demonstrated better prediction performances than optimistic counterparts when we have limited training data or perturbed testing data, showing the necessity of considering pessimistic solutions besides existing optimistic ones.

Paper Structure

This paper contains 22 sections, 5 theorems, 28 equations, 9 figures, 4 tables.

Key Result

Proposition 1

We have $P^*_o\leq P^*_p\leq P_p(\lambda^*_o,\cdot)$, and the equalities hold if $\Psi(\lambda)$ is a singleton for all $\lambda$.

Figures (9)

  • Figure 1: Comparison of pessimistic and optimistic solutions by the average testing accuracy of ten runs with random splits between training and validation sets.
  • Figure 2: Comparison of pessimistic and optimistic solutions by the average accuracy with respect to varying total sample size ($|V|+|T|$) and training to validation splitting ratio ($|T|/|V|$); Blue circles: performance results of optimistic models, Black circles: performance results of pessimistic models.
  • Figure 3: Average testing accuracy for varying training and validation set sizes for the Cancer data set
  • Figure 4: Testing accuracy comparison for various values of $\varepsilon$ and perturbation bound $\rho$ on $||\Delta||$ with clean validation data.
  • Figure 5: Testing accuracy comparison for various values of $\varepsilon$ and perturbation bound $\rho$ on $||\Delta||$ with perturbed validation data.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Proposition 2
  • Theorem 3
  • Remark 1
  • Corollary 4
  • Proposition 5