Table of Contents
Fetching ...

Bayesian Optimization for Simultaneous Selection of Machine Learning Algorithms and Hyperparameters on Shared Latent Space

Kazuki Ishikawa, Ryota Ozaki, Yohei Kanzaki, Ichiro Takeuchi, Masayuki Karasuyama

TL;DR

The paper tackles the Combined Algorithm Selection and Hyperparameter Optimization (CASH) problem by embedding heterogeneous hyper-parameter spaces from multiple ML algorithms into a shared latent space and conducting Bayesian optimization there. It introduces a multi-task Gaussian process surrogate on the latent space, coupled with a learnable embedding for each algorithm, and adds pre-training with adversarial regularization plus a learning-to-rank-based selection of embedding models for new datasets. This three-component framework enables information sharing across algorithms, improves early search efficiency, and reduces the total number of observations required to identify high-performing CASH configurations. Empirical results on OpenML datasets show the approach outperforming strong baselines, with ablations confirming the value of pre-training and the ranking-based PTEM selection for practical AutoML workflows.

Abstract

Selecting the optimal combination of a machine learning (ML) algorithm and its hyper-parameters is crucial for the development of high-performance ML systems. However, since the combination of ML algorithms and hyper-parameters is enormous, the exhaustive validation requires a significant amount of time. Many existing studies use Bayesian optimization (BO) for accelerating the search. On the other hand, a significant difficulty is that, in general, there exists a different hyper-parameter space for each one of candidate ML algorithms. BO-based approaches typically build a surrogate model independently for each hyper-parameter space, by which sufficient observations are required for all candidate ML algorithms. In this study, our proposed method embeds different hyper-parameter spaces into a shared latent space, in which a surrogate multi-task model for BO is estimated. This approach can share information of observations from different ML algorithms by which efficient optimization is expected with a smaller number of total observations. We further propose the pre-training of the latent space embedding with an adversarial regularization, and a ranking model for selecting an effective pre-trained embedding for a given target dataset. Our empirical study demonstrates effectiveness of the proposed method through datasets from OpenML.

Bayesian Optimization for Simultaneous Selection of Machine Learning Algorithms and Hyperparameters on Shared Latent Space

TL;DR

The paper tackles the Combined Algorithm Selection and Hyperparameter Optimization (CASH) problem by embedding heterogeneous hyper-parameter spaces from multiple ML algorithms into a shared latent space and conducting Bayesian optimization there. It introduces a multi-task Gaussian process surrogate on the latent space, coupled with a learnable embedding for each algorithm, and adds pre-training with adversarial regularization plus a learning-to-rank-based selection of embedding models for new datasets. This three-component framework enables information sharing across algorithms, improves early search efficiency, and reduces the total number of observations required to identify high-performing CASH configurations. Empirical results on OpenML datasets show the approach outperforming strong baselines, with ablations confirming the value of pre-training and the ranking-based PTEM selection for practical AutoML workflows.

Abstract

Selecting the optimal combination of a machine learning (ML) algorithm and its hyper-parameters is crucial for the development of high-performance ML systems. However, since the combination of ML algorithms and hyper-parameters is enormous, the exhaustive validation requires a significant amount of time. Many existing studies use Bayesian optimization (BO) for accelerating the search. On the other hand, a significant difficulty is that, in general, there exists a different hyper-parameter space for each one of candidate ML algorithms. BO-based approaches typically build a surrogate model independently for each hyper-parameter space, by which sufficient observations are required for all candidate ML algorithms. In this study, our proposed method embeds different hyper-parameter spaces into a shared latent space, in which a surrogate multi-task model for BO is estimated. This approach can share information of observations from different ML algorithms by which efficient optimization is expected with a smaller number of total observations. We further propose the pre-training of the latent space embedding with an adversarial regularization, and a ranking model for selecting an effective pre-trained embedding for a given target dataset. Our empirical study demonstrates effectiveness of the proposed method through datasets from OpenML.

Paper Structure

This paper contains 28 sections, 26 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: Overview of Proposed Framework.
  • Figure 2: Schematic illustration of MTGP on latent space. (a) Independent GPs are fitted to each ML algorithm separately. (b) The MTGP is fitted in the latent space, by which information from different ML algorithms are shared.
  • Figure 3: Schematic illustrations of quadratic surface prior obtained by pre-training (latent dimension is two). Each color corresponds to $\boldsymbol{\Lambda}^{(m)}$ embedded in ${\mathcal{U}}$.
  • Figure 4: Illustration of objective function in pre-training. $\mathrm{Acc}$ is fitted by the quadratic function through ${\mathcal{L}}^{(\mathrm{Pre-train})}$, while ${\mathcal{L}}_{(\mathrm{CE})}$ encourages sharing the latent space among different ML algorithms.
  • Figure 5: Ranking-based performance comparison ($10$ runs average ranking over $40$ target datasets).
  • ...and 5 more figures