Table of Contents
Fetching ...

Hybrid Parameter Search and Dynamic Model Selection for Mixed-Variable Bayesian Optimization

Hengrui Luo, Younghyun Cho, James W. Demmel, Xiaoye S. Li, Yang Liu

TL;DR

The paper tackles optimization of expensive black-box functions with mixed inputs by introducing hybridM, a hybrid Bayesian optimization framework that uses MCTS for the categorical space and Gaussian Processes for the continuous space, with online kernel selection. It introduces a novel UCTS update strategy for the categorical search and a rank-based kernel selection criterion that balances likelihood and acquisition. Empirical results on synthetic benchmarks and real applications (neural networks, STRUMPACK) show that hybridM achieves faster convergence and higher optima than competing mixed-variable BO methods, particularly in settings with many categories or inactive variables. This work advances auto-tuning for ML and HPC codes by providing an efficient, dynamic surrogate modeling approach and a scalable tree-based search.

Abstract

This paper presents a new type of hybrid model for Bayesian optimization (BO) adept at managing mixed variables, encompassing both quantitative (continuous and integer) and qualitative (categorical) types. Our proposed new hybrid models (named hybridM) merge the Monte Carlo Tree Search structure (MCTS) for categorical variables with Gaussian Processes (GP) for continuous ones. hybridM leverages the upper confidence bound tree search (UCTS) for MCTS strategy, showcasing the tree architecture's integration into Bayesian optimization. Our innovations, including dynamic online kernel selection in the surrogate modeling phase and a unique UCTS search strategy, position our hybrid models as an advancement in mixed-variable surrogate models. Numerical experiments underscore the superiority of hybrid models, highlighting their potential in Bayesian optimization.

Hybrid Parameter Search and Dynamic Model Selection for Mixed-Variable Bayesian Optimization

TL;DR

The paper tackles optimization of expensive black-box functions with mixed inputs by introducing hybridM, a hybrid Bayesian optimization framework that uses MCTS for the categorical space and Gaussian Processes for the continuous space, with online kernel selection. It introduces a novel UCTS update strategy for the categorical search and a rank-based kernel selection criterion that balances likelihood and acquisition. Empirical results on synthetic benchmarks and real applications (neural networks, STRUMPACK) show that hybridM achieves faster convergence and higher optima than competing mixed-variable BO methods, particularly in settings with many categories or inactive variables. This work advances auto-tuning for ML and HPC codes by providing an efficient, dynamic surrogate modeling approach and a scalable tree-based search.

Abstract

This paper presents a new type of hybrid model for Bayesian optimization (BO) adept at managing mixed variables, encompassing both quantitative (continuous and integer) and qualitative (categorical) types. Our proposed new hybrid models (named hybridM) merge the Monte Carlo Tree Search structure (MCTS) for categorical variables with Gaussian Processes (GP) for continuous ones. hybridM leverages the upper confidence bound tree search (UCTS) for MCTS strategy, showcasing the tree architecture's integration into Bayesian optimization. Our innovations, including dynamic online kernel selection in the surrogate modeling phase and a unique UCTS search strategy, position our hybrid models as an advancement in mixed-variable surrogate models. Numerical experiments underscore the superiority of hybrid models, highlighting their potential in Bayesian optimization.
Paper Structure (18 sections, 16 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 18 sections, 16 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Algorithmic representation for the update step (i.e., heuristic search for the next sequential sample) in the proposed hybrid model, where we provide two different stretegies UCTS (hybridM) . Note that the fitting of GPs with different kernels can be parallelized rather than fitted sequentially.
  • Figure 2: Comparison (with performance line plot per budget and box plot of final optima) of performance over 20 batches for: (row 1) a scaled version of the function func3C in ru_bayesian_2020 with maximum 55; (row 2) a scaled version of the discrete Rosenbrock function of 7 dimensions (4 continuous variables in $[-5,5]$ and 3 categorical variables in $\{-5,-4,\cdots,4,5\}$) in Malkomes2016Selection with maximum 0; (row 3) function \ref{['eq:Friedman-8C']} with maximum 30.
  • Figure 3: Comparison of performance between different but fixed kernels of GP surrogate with the selection criterion \ref{['eq:C_k_custom']} and fixed kernel GP surrogates on the functions func3C in ru_bayesian_2020, \ref{['eq:discrete rosenbrock_rosenbrock']}and \ref{['eq:Friedman-8C']} over 20 batches. The actual maximum is shown by dashed lines.
  • Figure 4: Comparison of performance between the hybrid models with different selection criteria: acq (acquisition function only), AIC, BIC, HQC, loglik (log likelihood) and $R_{1/2}$ in \ref{['eq:C_k_custom']}, on the functions func3C in ru_bayesian_2020, \ref{['eq:discrete rosenbrock_rosenbrock']}and \ref{['eq:Friedman-8C']} over 20 batches. The actual maximum is shown by dashed lines.
  • Figure 5: Comparison boxplots of performance between different methods on the different tuning methods on the regression neural network specified in Table \ref{['tab:Different-hyper-parameters-in']} for datasets in Table \ref{['tab:Data-format']} over 10 repeated batches. The theoretical minimum in-sample MSE is 0. We also display the average iterations needed to attain optima at the top of each box.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Example