Table of Contents
Fetching ...

Dynamic Priors in Bayesian Optimization for Hyperparameter Optimization

Lukas Fehring, Marcel Wever, Maximilian Spliethöver, Leona Hennig, Henning Wachsmuth, Marius Lindauer

TL;DR

DynaBO introduces dynamic priors for Bayesian optimization in hyperparameter optimization, enabling users to inject priors at any point during the optimization and stacking multiple priors with time-decayed influence. It integrates a prior-weighted acquisition function and a prior rejection mechanism to safeguard against misleading inputs, preserving BO convergence guarantees while offering potential acceleration from informative priors. The approach is theoretically grounded, proving almost sure convergence and robustness to adversarial priors, and empirically demonstrates speedups over vanilla BO and piBO across diverse benchmarks. This work advances human-in-the-loop AutoML by enabling online collaboration between users and optimization algorithms, with practical robustness and scalability considerations.

Abstract

Hyperparameter optimization (HPO), for example, based on Bayesian optimization (BO), supports users in designing models well-suited for a given dataset. HPO has proven its effectiveness on several applications, ranging from classical machine learning for tabular data to deep neural networks for computer vision and transformers for natural language processing. However, HPO still sometimes lacks acceptance by machine learning experts due to its black-box nature and limited user control. Addressing this, first approaches have been proposed to initialize BO methods with expert knowledge. However, these approaches do not allow for online steering during the optimization process. In this paper, we introduce a novel method that enables repeated interventions to steer BO via user input, specifying expert knowledge and user preferences at runtime of the HPO process in the form of prior distributions. To this end, we generalize an existing method, $π$BO, preserving theoretical guarantees. We also introduce a misleading prior detection scheme, which allows protection against harmful user inputs. In our experimental evaluation, we demonstrate that our method can effectively incorporate multiple priors, leveraging informative priors, whereas misleading priors are reliably rejected or overcome. Thereby, we achieve competitiveness to unperturbed BO.

Dynamic Priors in Bayesian Optimization for Hyperparameter Optimization

TL;DR

DynaBO introduces dynamic priors for Bayesian optimization in hyperparameter optimization, enabling users to inject priors at any point during the optimization and stacking multiple priors with time-decayed influence. It integrates a prior-weighted acquisition function and a prior rejection mechanism to safeguard against misleading inputs, preserving BO convergence guarantees while offering potential acceleration from informative priors. The approach is theoretically grounded, proving almost sure convergence and robustness to adversarial priors, and empirically demonstrates speedups over vanilla BO and piBO across diverse benchmarks. This work advances human-in-the-loop AutoML by enabling online collaboration between users and optimization algorithms, with practical robustness and scalability considerations.

Abstract

Hyperparameter optimization (HPO), for example, based on Bayesian optimization (BO), supports users in designing models well-suited for a given dataset. HPO has proven its effectiveness on several applications, ranging from classical machine learning for tabular data to deep neural networks for computer vision and transformers for natural language processing. However, HPO still sometimes lacks acceptance by machine learning experts due to its black-box nature and limited user control. Addressing this, first approaches have been proposed to initialize BO methods with expert knowledge. However, these approaches do not allow for online steering during the optimization process. In this paper, we introduce a novel method that enables repeated interventions to steer BO via user input, specifying expert knowledge and user preferences at runtime of the HPO process in the form of prior distributions. To this end, we generalize an existing method, BO, preserving theoretical guarantees. We also introduce a misleading prior detection scheme, which allows protection against harmful user inputs. In our experimental evaluation, we demonstrate that our method can effectively incorporate multiple priors, leveraging informative priors, whereas misleading priors are reliably rejected or overcome. Thereby, we achieve competitiveness to unperturbed BO.

Paper Structure

This paper contains 39 sections, 3 theorems, 34 equations, 10 figures, 2 tables.

Key Result

Theorem 1

Under the assumptions in apx:proofs, the sequence of query points selected by DynaBO, $\{\lambda_t\}_{t=1}^{\infty}\subset\Lambda$, satisfies almost sure convergence to the global optimum; that is, irrespective of the variation in priors, the method converges to an optimal configuration with probabi

Figures (10)

  • Figure 1: Overview of the proposed dynamic Bayesian optimization (DynaBO) method. Provided a dataset, a performance measure, a configuration space, and an optional initial prior, the loop iteratively selects new hyperparameter configurations. At each step, a candidate configuration is evaluated, and it is assessed. The process continues until an optimized configuration is identified. The framework allows users to steer the optimization process by dynamically adding priors at runtime.
  • Figure 2: Acquisition function impact of priors $\pi^1,\pi^2,\pi^3$, provided at $t=10, 20, \text{and } 30$, with $\pi(\lambda)=0.5$.
  • Figure 3: Illustration of the candidate selection process in DynaBO, incorporating a user-provided prior $\pi^{(i)}$. A safeguard mechanism evaluates the prior, determining whether to accept or reject it. If accepted, the candidate selection is biased by the prior; otherwise, the user can overrule the rejection.
  • Figure 4: Mean regret for lcbench, xgboost, and PD1 using Expert, Advanced, Local, and Adversarial priors. Priors are provided at vertical lines. The shaded areas visualize the standard error. For lcbench and xgboost, the plots average all datasets. The results indicate DynaBO outperforming $\pi$BO and remaining competitive to vanilla BO for adversarial priors.
  • Figure 5: Anytime regret for PD1 averaged over $30$ seeds, and scenarios comparing vanilla BO, $\pi$BO, DynaBO-accept all priors, and DynaBO with validation (DynaBO-validation).
  • ...and 5 more figures

Theorems & Definitions (6)

  • Theorem 1: Almost Sure Convergence of DynaBO
  • Theorem 2: Robustness to Misleading Priors
  • Theorem 3: Acceleration of Convergence with Informative Priors
  • proof
  • proof
  • proof