Bayesian Optimization for Simultaneous Selection of Machine Learning Algorithms and Hyperparameters on Shared Latent Space
Kazuki Ishikawa, Ryota Ozaki, Yohei Kanzaki, Ichiro Takeuchi, Masayuki Karasuyama
TL;DR
The paper tackles the Combined Algorithm Selection and Hyperparameter Optimization (CASH) problem by embedding heterogeneous hyper-parameter spaces from multiple ML algorithms into a shared latent space and conducting Bayesian optimization there. It introduces a multi-task Gaussian process surrogate on the latent space, coupled with a learnable embedding for each algorithm, and adds pre-training with adversarial regularization plus a learning-to-rank-based selection of embedding models for new datasets. This three-component framework enables information sharing across algorithms, improves early search efficiency, and reduces the total number of observations required to identify high-performing CASH configurations. Empirical results on OpenML datasets show the approach outperforming strong baselines, with ablations confirming the value of pre-training and the ranking-based PTEM selection for practical AutoML workflows.
Abstract
Selecting the optimal combination of a machine learning (ML) algorithm and its hyper-parameters is crucial for the development of high-performance ML systems. However, since the combination of ML algorithms and hyper-parameters is enormous, the exhaustive validation requires a significant amount of time. Many existing studies use Bayesian optimization (BO) for accelerating the search. On the other hand, a significant difficulty is that, in general, there exists a different hyper-parameter space for each one of candidate ML algorithms. BO-based approaches typically build a surrogate model independently for each hyper-parameter space, by which sufficient observations are required for all candidate ML algorithms. In this study, our proposed method embeds different hyper-parameter spaces into a shared latent space, in which a surrogate multi-task model for BO is estimated. This approach can share information of observations from different ML algorithms by which efficient optimization is expected with a smaller number of total observations. We further propose the pre-training of the latent space embedding with an adversarial regularization, and a ranking model for selecting an effective pre-trained embedding for a given target dataset. Our empirical study demonstrates effectiveness of the proposed method through datasets from OpenML.
