Modeling All Response Surfaces in One for Conditional Search Spaces

Jiaxing Li; Wei Liu; Chao Xue; Yibing Zhan; Xiaoxing Wang; Weifeng Liu; Dacheng Tao

Modeling All Response Surfaces in One for Conditional Search Spaces

Jiaxing Li, Wei Liu, Chao Xue, Yibing Zhan, Xiaoxing Wang, Weifeng Liu, Dacheng Tao

TL;DR

Bayesian Optimization under conditional, tree-structured search spaces is challenging due to varying subspace structures and limited cross-subspace information sharing. The paper introduces AttnBO, an attention-based framework that uses structure-aware hyperparameter embeddings and a Transformer-based encoder to map all subspaces into a unified latent space for a single Gaussian Process surrogate, enabling efficient, batch Bayesian optimization. Empirical results on a tree-structured simulation, Neural Architecture Search, OpenML CASH tasks, and the HPO-B benchmark show improved sample efficiency and performance over strong baselines, illustrating effective cross-subspace learning and practical impact for AutoML.

Abstract

Bayesian Optimization (BO) is a sample-efficient black-box optimizer commonly used in search spaces where hyperparameters are independent. However, in many practical AutoML scenarios, there will be dependencies among hyperparameters, forming a conditional search space, which can be partitioned into structurally distinct subspaces. The structure and dimensionality of hyperparameter configurations vary across these subspaces, challenging the application of BO. Some previous BO works have proposed solutions to develop multiple Gaussian Process models in these subspaces. However, these approaches tend to be inefficient as they require a substantial number of observations to guarantee each GP's performance and cannot capture relationships between hyperparameters across different subspaces. To address these issues, this paper proposes a novel approach to model the response surfaces of all subspaces in one, which can model the relationships between hyperparameters elegantly via a self-attention mechanism. Concretely, we design a structure-aware hyperparameter embedding to preserve the structural information. Then, we introduce an attention-based deep feature extractor, capable of projecting configurations with different structures from various subspaces into a unified feature space, where the response surfaces can be formulated using a single standard Gaussian Process. The empirical results on a simulation function, various real-world tasks, and HPO-B benchmark demonstrate that our proposed approach improves the efficacy and efficiency of BO within conditional search spaces.

Modeling All Response Surfaces in One for Conditional Search Spaces

TL;DR

Abstract

Paper Structure (46 sections, 6 equations, 18 figures, 2 tables, 1 algorithm)

This paper contains 46 sections, 6 equations, 18 figures, 2 tables, 1 algorithm.

Introduction
BO for Conditional Search Space
Preliminaries
Deep Kernel Learning for Gaussian Process
Self-Attention Mechanism
Methodology
Problem Formulation
Structure-aware Embeddings
BO with Attention-based DKGP
Experiments
Experimental setting
Baselines.
Implementation details.
Other settings.
Experiment Analysis
...and 31 more sections

Figures (18)

Figure 1: An example of the tree-structured search space for a CASH task. The space contains two popular algorithms and their distinct hyperparameters. According to the dependencies among hyperparameters, the search space can be formed as 9 nodes and partitioned into 6 flat subspaces. The approaches using separate GPs build a model $GP_i$ for each subspace and conduct optimization in each subspace. Add-tree addTree_tpami builds a kernel on each node and integrates them using the additive assumption. Our method explores to build a unified surrogate model $\mathcal{M}$ for all subspaces $\chi^1 \sim \chi^6$.
Figure 2: The framework of AttnBO. We introduce three elements—the identity, index, and father's identity of a hyperparameter, which preserve structural features for each hyperparameter. Each hyperparameter embedding can be considered a token, and a configuration can therefore be viewed as a sequence of tokens. Then we employ an attention-based encoder to capture the relationships of these tokens and project all sequences into a unified latent space where a GP-based BO can work directly.
Figure 3: Performance of our AttnBO and baselines on the conditional simulation objective function.
Figure 4: Performance of baselines and AttnBO on the complex NAS space.
Figure 5: Average rankings of various methods on three machine-learning tasks evaluated on real-world OpenML datasets.
...and 13 more figures

Modeling All Response Surfaces in One for Conditional Search Spaces

TL;DR

Abstract

Modeling All Response Surfaces in One for Conditional Search Spaces

Authors

TL;DR

Abstract

Table of Contents

Figures (18)