Table of Contents
Fetching ...

Provable Accelerated Bayesian Optimization with Knowledge Transfer

Haitao Lin, Boxin Zhao, Mladen Kolar, Chong Liu

TL;DR

DeltaBO addresses accelerating Bayesian optimization on a target task by transferring knowledge from related source tasks. It models the target as f = g + δ where g and δ come from independent Gaussian processes and uses the source posterior to form unbiased, noisy observations of δ, yielding an acquisition with improved uncertainty quantification. Theoretical guarantees show a regret bound of $\tilde{O}(\sqrt{T(T/N + γ_δ)})$, highlighting gains when the source data is plentiful ($N \gg T$) and the difference function is easier to learn ($γ_δ \ll γ_f$). Empirical results on real AutoML and synthetic benchmarks confirm DeltaBO outperforms baselines, validating both the theory and its practical potential for transfer-enabled BO.

Abstract

We study how Bayesian optimization (BO) can be accelerated on a target task with historical knowledge transferred from related source tasks. Existing works on BO with knowledge transfer either do not have theoretical guarantees or achieve the same regret as BO in the non-transfer setting, $\tilde{\mathcal{O}}(\sqrt{T γ_f})$, where $T$ is the number of evaluations of the target function and $γ_f$ denotes its information gain. In this paper, we propose the DeltaBO algorithm, in which a novel uncertainty-quantification approach is built on the difference function $δ$ between the source and target functions, which are allowed to belong to different reproducing kernel Hilbert spaces (RKHSs). Under mild assumptions, we prove that the regret of DeltaBO is of order $\tilde{\mathcal{O}}(\sqrt{T (T/N + γ_δ)})$, where $N$ denotes the number of evaluations from source tasks and typically $N \gg T$. In many applications, source and target tasks are similar, which implies that $γ_δ$ can be much smaller than $γ_f$. Empirical studies on both real-world hyperparameter tuning tasks and synthetic functions show that DeltaBO outperforms other baseline methods and support our theoretical claims.

Provable Accelerated Bayesian Optimization with Knowledge Transfer

TL;DR

DeltaBO addresses accelerating Bayesian optimization on a target task by transferring knowledge from related source tasks. It models the target as f = g + δ where g and δ come from independent Gaussian processes and uses the source posterior to form unbiased, noisy observations of δ, yielding an acquisition with improved uncertainty quantification. Theoretical guarantees show a regret bound of , highlighting gains when the source data is plentiful () and the difference function is easier to learn (). Empirical results on real AutoML and synthetic benchmarks confirm DeltaBO outperforms baselines, validating both the theory and its practical potential for transfer-enabled BO.

Abstract

We study how Bayesian optimization (BO) can be accelerated on a target task with historical knowledge transferred from related source tasks. Existing works on BO with knowledge transfer either do not have theoretical guarantees or achieve the same regret as BO in the non-transfer setting, , where is the number of evaluations of the target function and denotes its information gain. In this paper, we propose the DeltaBO algorithm, in which a novel uncertainty-quantification approach is built on the difference function between the source and target functions, which are allowed to belong to different reproducing kernel Hilbert spaces (RKHSs). Under mild assumptions, we prove that the regret of DeltaBO is of order , where denotes the number of evaluations from source tasks and typically . In many applications, source and target tasks are similar, which implies that can be much smaller than . Empirical studies on both real-world hyperparameter tuning tasks and synthetic functions show that DeltaBO outperforms other baseline methods and support our theoretical claims.

Paper Structure

This paper contains 31 sections, 7 theorems, 86 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $\rho \in (0,1)$ denote the error tolerance probability. Assume that the decision set $\mathcal{D}$ is finite with cardinality $\vert \mathcal{D} \vert$, and that the source dataset $\mathcal{S}^{(0)}$ contains $N$ observations of $g$. Consider running DeltaBO with Then, under Assumption asm:add, with probability at least $1-\rho$, for all $T \geq 1$, the cumulative regret satisfies where s

Figures (2)

  • Figure 1: Cumulative regrets of all compared algorithms.
  • Figure 2: Average regrets of all compared algorithms.

Theorems & Definitions (14)

  • Theorem 1: Cumulative regret bound of DeltaBO
  • Corollary 1
  • Remark
  • Proposition 1: Information gain with amplitude scaling
  • Remark : Two drivers of reduced information gain
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • ...and 4 more