MALIBO: Meta-learning for Likelihood-free Bayesian Optimization

Jiarong Pan; Stefan Falkner; Felix Berkenkamp; Joaquin Vanschoren

MALIBO: Meta-learning for Likelihood-free Bayesian Optimization

Jiarong Pan, Stefan Falkner, Felix Berkenkamp, Joaquin Vanschoren

TL;DR

MALIBO tackles the scalability and robustness weaknesses of surrogate-based meta-learning BO by learning a likelihood-free acquisition function across related tasks. It combines a meta-learned, task-agnostic feature space with a task embedding to capture uncertainty, and augments this with Thompson sampling for exploration and a gradient-boosted residual predictor for robust adaptation to unseen tasks. The method replaces costly surrogates with a probabilistic classifier whose output governs query utility, and uses a Laplace-based approximation to propagate task uncertainty into acquisition decisions. Empirical results across AutoML benchmarks show MALIBO achieves strong anytime performance, scalable runtime, and robustness to heterogeneous noise, outperforming a wide range of baselines.

Abstract

Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.

MALIBO: Meta-learning for Likelihood-free Bayesian Optimization

TL;DR

Abstract

Paper Structure (55 sections, 34 equations, 30 figures, 4 tables, 1 algorithm)

This paper contains 55 sections, 34 equations, 30 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Meta-learning Bayesian optimization
Likelihood-free acquisition functions
Background
Meta-learning Bayesian optimization
Likelihood-free acquisition functions
Methodology
Network structure
Meta-learning
Task adaptation
Uncertainty-based exploration
Gradient boosting as a residual prediction model
Experiments
Baselines
...and 40 more sections

Figures (30)

Figure 1: Meta-learning the acquisition function. Left: The top panel shows observations from 10 related tasks and the target task. The top performing observations ($\tau = \Phi^{-1}(\gamma), \gamma=1/3$) in each task are shown in red, the rest in blue. The bottom panel shows the maximum-a-posteriori estimate of the acquisition function in solid blue while the Thompson samples are shown as dashed curves. Right: Features learned by our model. MALIBO successfully identifies the promising areas in the input space, while the Thompson samples show variability in the meta-learned acquisition function.
Figure 2: Schematic representation of our meta-learning classifier. A residual feedfoward network (ResFFN) maps the input $\mathbf{x}$ via a shared feature mapping function $\phi$. From this, we construct a task-agnostic mean prediction $m(\bm\Phi)$ and a task embedding $\mathbf{z}_t$, which is distributed according to a prior distribution $p(\mathcal{Z})$. The feature mapping function $\phi$ and mean prediction layer $m$ are fixed after meta-training, denoted by the task-agnostic component $g_{\bm \omega}$. Finally, we add and convert them to a class prediction via the sigmoid function.
Figure 3: Effects of exploration and residual predictions. Color circles denote the optimization queries (from bright to dark), the dashed curve denotes a Thompson sample (TS) of the acquisition function and the orange curve shows the sample combined with gradient boosting (GB).
Figure 4: Runtime of different BO algorithms over optimization steps. We show the typical results for two benchmarks and plot the medial inter-quantiles to remove outliers.
Figure 5: Aggregated normalized regrets for BO algorithms on real-world AutoML problems.
...and 25 more figures

MALIBO: Meta-learning for Likelihood-free Bayesian Optimization

TL;DR

Abstract

MALIBO: Meta-learning for Likelihood-free Bayesian Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (30)