Table of Contents
Fetching ...

Clustering-based Meta Bayesian Optimization with Theoretical Guarantee

Khoa Nguyen, Viet Huynh, Binh Tran, Tri Pham, Tin Huynh, Thin Nguyen

TL;DR

This work addresses meta-BO under heterogeneous and large-scale historical tasks by introducing cm-BO, which clusters GP posteriors from past tasks into homogeneous groups and builds cluster prototypes as geometric centers or Wasserstein barycenters. It then online-synthesizes an adaptive meta-prior as a weighted sum of prototypes, with weights updated via distance-based metrics to reduce negative transfer. The authors establish regret guarantees for the geometric-center variant and demonstrate robust, scalable performance on real-world hyperparameter optimization benchmarks, often surpassing non-meta and other meta-BO baselines. The approach offers practical benefits for efficient global optimization across dissimilar tasks and provides a solid foundation for integrating more sophisticated transport-based priors in future work.

Abstract

Bayesian Optimization (BO) is a well-established method for addressing black-box optimization problems. In many real-world scenarios, optimization often involves multiple functions, emphasizing the importance of leveraging data and learned functions from prior tasks to enhance efficiency in the current task. To expedite convergence to the global optimum, recent studies have introduced meta-learning strategies, collectively referred to as meta-BO, to incorporate knowledge from historical tasks. However, in practical settings, the underlying functions are often heterogeneous, which can adversely affect optimization performance for the current task. Additionally, when the number of historical tasks is large, meta-BO methods face significant scalability challenges. In this work, we propose a scalable and robust meta-BO method designed to address key challenges in heterogeneous and large-scale meta-tasks. Our approach (1) effectively partitions transferred meta-functions into highly homogeneous clusters, (2) learns the geometry-based surrogate prototype that capture the structural patterns within each cluster, and (3) adaptively synthesizes meta-priors during the online phase using statistical distance-based weighting policies. Experimental results on real-world hyperparameter optimization (HPO) tasks, combined with theoretical guarantees, demonstrate the robustness and effectiveness of our method in overcoming these challenges.

Clustering-based Meta Bayesian Optimization with Theoretical Guarantee

TL;DR

This work addresses meta-BO under heterogeneous and large-scale historical tasks by introducing cm-BO, which clusters GP posteriors from past tasks into homogeneous groups and builds cluster prototypes as geometric centers or Wasserstein barycenters. It then online-synthesizes an adaptive meta-prior as a weighted sum of prototypes, with weights updated via distance-based metrics to reduce negative transfer. The authors establish regret guarantees for the geometric-center variant and demonstrate robust, scalable performance on real-world hyperparameter optimization benchmarks, often surpassing non-meta and other meta-BO baselines. The approach offers practical benefits for efficient global optimization across dissimilar tasks and provides a solid foundation for integrating more sophisticated transport-based priors in future work.

Abstract

Bayesian Optimization (BO) is a well-established method for addressing black-box optimization problems. In many real-world scenarios, optimization often involves multiple functions, emphasizing the importance of leveraging data and learned functions from prior tasks to enhance efficiency in the current task. To expedite convergence to the global optimum, recent studies have introduced meta-learning strategies, collectively referred to as meta-BO, to incorporate knowledge from historical tasks. However, in practical settings, the underlying functions are often heterogeneous, which can adversely affect optimization performance for the current task. Additionally, when the number of historical tasks is large, meta-BO methods face significant scalability challenges. In this work, we propose a scalable and robust meta-BO method designed to address key challenges in heterogeneous and large-scale meta-tasks. Our approach (1) effectively partitions transferred meta-functions into highly homogeneous clusters, (2) learns the geometry-based surrogate prototype that capture the structural patterns within each cluster, and (3) adaptively synthesizes meta-priors during the online phase using statistical distance-based weighting policies. Experimental results on real-world hyperparameter optimization (HPO) tasks, combined with theoretical guarantees, demonstrate the robustness and effectiveness of our method in overcoming these challenges.

Paper Structure

This paper contains 13 sections, 18 theorems, 24 equations, 3 figures, 1 table, 1 algorithm.

Key Result

theorem thmcountertheorem

Suppose that $\mathcal{GP}^{(1)},\mathcal{GP}^{(2)},\ldots,\mathcal{GP}^{(N)}$ are $N$ independent GPs over Euclidean space $\mathbb{R}^{d}$. Their linear combination $\hat{\mathcal{GP}}:=\sum_{i=1}^{N}a_{i}\mathcal{GP}^{(i)}$ (where $a_{i}\in\mathbb{R}_{+}$) is also a GP over $\mathbb{R}^{d}$adler1

Figures (3)

  • Figure 1: MNSR across all compared methods, on 3 types of AF from left to right: GP-UCB, EI, PI (experimental results on rpart.preproc search space)
  • Figure 2: Average ranking (left) and fraction of solvable BO runs at $C=0.005$ (right)
  • Figure 3: Mean NSR when varying the number of meta-task clusters for the two $\text{cm-BO}$ variants, WssClus_WssCMP and WssClus_WssCMP_Bary (experimental results from a train-test split seed in rpart.preproc meta-dataset).

Theorems & Definitions (25)

  • theorem thmcountertheorem
  • proposition thmcounterproposition
  • remark thmcounterremark
  • theorem thmcountertheorem
  • remark thmcounterremark
  • theorem thmcountertheorem
  • lemma thmcounterlemma
  • theorem thmcountertheorem
  • proof
  • proposition thmcounterproposition
  • ...and 15 more