Table of Contents
Fetching ...

High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling

Yuxuan Yin, Yu Wang, Peng Li

TL;DR

This work integrates the teacher-student paradigm into BO to minimize expensive labeled data queries for the first time, and proposes two optimized unlabeled data samplers to construct effective student feedback that well aligns with the objective of Bayesian optimization.

Abstract

We introduce a novel semi-supervised learning approach, named Teacher-Student Bayesian Optimization ($\texttt{TSBO}$), integrating the teacher-student paradigm into BO to minimize expensive labeled data queries for the first time. $\texttt{TSBO}$ incorporates a teacher model, an unlabeled data sampler, and a student model. The student is trained on unlabeled data locations generated by the sampler, with pseudo labels predicted by the teacher. The interplay between these three components implements a unique selective regularization to the teacher in the form of student feedback. This scheme enables the teacher to predict high-quality pseudo labels, enhancing the generalization of the GP surrogate model in the search space. To fully exploit $\texttt{TSBO}$, we propose two optimized unlabeled data samplers to construct effective student feedback that well aligns with the objective of Bayesian optimization. Furthermore, we quantify and leverage the uncertainty of the teacher-student model for the provision of reliable feedback to the teacher in the presence of risky pseudo-label predictions. $\texttt{TSBO}$ demonstrates significantly improved sample-efficiency in several global optimization tasks under tight labeled data budgets.

High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling

TL;DR

This work integrates the teacher-student paradigm into BO to minimize expensive labeled data queries for the first time, and proposes two optimized unlabeled data samplers to construct effective student feedback that well aligns with the objective of Bayesian optimization.

Abstract

We introduce a novel semi-supervised learning approach, named Teacher-Student Bayesian Optimization (), integrating the teacher-student paradigm into BO to minimize expensive labeled data queries for the first time. incorporates a teacher model, an unlabeled data sampler, and a student model. The student is trained on unlabeled data locations generated by the sampler, with pseudo labels predicted by the teacher. The interplay between these three components implements a unique selective regularization to the teacher in the form of student feedback. This scheme enables the teacher to predict high-quality pseudo labels, enhancing the generalization of the GP surrogate model in the search space. To fully exploit , we propose two optimized unlabeled data samplers to construct effective student feedback that well aligns with the objective of Bayesian optimization. Furthermore, we quantify and leverage the uncertainty of the teacher-student model for the provision of reliable feedback to the teacher in the presence of risky pseudo-label predictions. demonstrates significantly improved sample-efficiency in several global optimization tasks under tight labeled data budgets.
Paper Structure (36 sections, 13 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 36 sections, 13 equations, 4 figures, 9 tables, 1 algorithm.

Figures (4)

  • Figure 1: Visualization of queried data (dots) and trends (arrow sequences) on a high-dimensional molecule design task sterling2015zinc to maximize the Penalized LogP score gomez2018semibo1. Red and blue colors represent TSBO and a baseline (with vanilla BO), respectively. The evaluation budget is 450 in both approaches.
  • Figure 2: Illustrated example to demonstrate the interaction between the unlabeled data sampler, the teacher and the student employs selective regularization. (a): unlabeled data are sampled from regions with potentially high values. (b): the teacher predicts pseudo labels for unlabeled data. (c): the student learns from the unlabeled data and the predicted pseudo labels and is evaluated on labeled data as the feedback. (d): the teacher refines its prediction based on the feedback. (e): GP in $\texttt{TSBO}$ fits on both the labeled data and unlabeled data with refined pseudo labels. (f): GP in vanilla BO fits only on the labeled data.
  • Figure 3: Overview of the $\texttt{TSBO}$ framework. (a): the basic Latent Space BO architecture. (b): the vanilla BO flow utilizes only the encoded labeled data to train the GP model and query the next data. (c): $\texttt{TSBO}$ flow incorporates a TS Core to provide additional high-quality unlabeled data to the GP model during each BO iteration.. (d): inside TS Core: the optimized unlabeled data sampler and the feedback from the student provides selective regularization to the teacher.
  • Figure 4: Comparison between the mean performance and standard deviations between 4 LSO baselines and TSBO.