Table of Contents
Fetching ...

Principled Bayesian Optimisation in Collaboration with Human Experts

Wenjie Xu, Masaki Adachi, Colin N. Jones, Michael A. Osborne

TL;DR

This paper presents COBOL, a principled Bayesian optimisation framework that collaborates with human experts via binary labels. It jointly models the objective with a Gaussian process and the expert belief with a likelihood-ratio surrogate, and mixes them through a primal-dual acquisition rule to adaptively trust expert input. The authors prove sublinear growth of required expert labels (handover) and a no-harm guarantee that convergence is at least as good as vanilla BO, even against adversarial advice. Empirically, COBOL accelerates convergence in battery-design tasks while remaining robust to varying labeling accuracy, and maintains practical computation times. The approach offers a data-driven mechanism to harness human expertise in challenging scientific optimisation problems, with avenues for extension to other feedback forms and integration with advanced models such as LLMs.

Abstract

Bayesian optimisation for real-world problems is often performed interactively with human experts, and integrating their domain knowledge is key to accelerate the optimisation process. We consider a setup where experts provide advice on the next query point through binary accept/reject recommendations (labels). Experts' labels are often costly, requiring efficient use of their efforts, and can at the same time be unreliable, requiring careful adjustment of the degree to which any expert is trusted. We introduce the first principled approach that provides two key guarantees. (1) Handover guarantee: similar to a no-regret property, we establish a sublinear bound on the cumulative number of experts' binary labels. Initially, multiple labels per query are needed, but the number of expert labels required asymptotically converges to zero, saving both expert effort and computation time. (2) No-harm guarantee with data-driven trust level adjustment: our adaptive trust level ensures that the convergence rate will not be worse than the one without using advice, even if the advice from experts is adversarial. Unlike existing methods that employ a user-defined function that hand-tunes the trust level adjustment, our approach enables data-driven adjustments. Real-world applications empirically demonstrate that our method not only outperforms existing baselines, but also maintains robustness despite varying labelling accuracy, in tasks of battery design with human experts.

Principled Bayesian Optimisation in Collaboration with Human Experts

TL;DR

This paper presents COBOL, a principled Bayesian optimisation framework that collaborates with human experts via binary labels. It jointly models the objective with a Gaussian process and the expert belief with a likelihood-ratio surrogate, and mixes them through a primal-dual acquisition rule to adaptively trust expert input. The authors prove sublinear growth of required expert labels (handover) and a no-harm guarantee that convergence is at least as good as vanilla BO, even against adversarial advice. Empirically, COBOL accelerates convergence in battery-design tasks while remaining robust to varying labeling accuracy, and maintains practical computation times. The approach offers a data-driven mechanism to harness human expertise in challenging scientific optimisation problems, with avenues for extension to other feedback forms and integration with advanced models such as LLMs.

Abstract

Bayesian optimisation for real-world problems is often performed interactively with human experts, and integrating their domain knowledge is key to accelerate the optimisation process. We consider a setup where experts provide advice on the next query point through binary accept/reject recommendations (labels). Experts' labels are often costly, requiring efficient use of their efforts, and can at the same time be unreliable, requiring careful adjustment of the degree to which any expert is trusted. We introduce the first principled approach that provides two key guarantees. (1) Handover guarantee: similar to a no-regret property, we establish a sublinear bound on the cumulative number of experts' binary labels. Initially, multiple labels per query are needed, but the number of expert labels required asymptotically converges to zero, saving both expert effort and computation time. (2) No-harm guarantee with data-driven trust level adjustment: our adaptive trust level ensures that the convergence rate will not be worse than the one without using advice, even if the advice from experts is adversarial. Unlike existing methods that employ a user-defined function that hand-tunes the trust level adjustment, our approach enables data-driven adjustments. Real-world applications empirically demonstrate that our method not only outperforms existing baselines, but also maintains robustness despite varying labelling accuracy, in tasks of battery design with human experts.

Paper Structure

This paper contains 57 sections, 9 theorems, 78 equations, 12 figures, 3 tables, 2 algorithms.

Key Result

Lemma 3.1

Let Assumptions assump:support_set, assump:bounded_norm and assump:obj_obs hold. For any $\delta\in(0, 1)$, with probability at least $1-\delta/2$, the following holds for all $x \in \mathcal{X}$ and $1\leq t \leq T$, $T\in\mathbb{N}$, where $\mu_{f_t}(x), \sigma_{f_t}(x)$ and $\gamma^f_{|\mathcal{Q}^f_{t-1}|}$ are as given in Eq. eq:mean_cov and Eq. eq:max_inf_gain, and $\gamma^f_{0}=0$.

Figures (12)

  • Figure 1: BO-expert collaboration framework: The algorithm (red) decides if an expert's (blue) label is necessary. If rejected, it generates a different candidate; otherwise, it directly queries.
  • Figure 2: Visual explanation: While the vanilla LCB returns $x^u_t$, a far point from global minimum $x^*$, expert-augmented LCB can successfully navigate to closer point $x^c_t$ by mixing $f_t$ and $g_t$ with $\underline{f}_t+\lambda_t\underline{g}_t$, where $\lambda_t$ is the dual variable. In the figure, $D_t^f$ is the set of the sample points of the objective function $f$ and $D_t^g$ is the set of human feedback.
  • Figure 3: Robustness and sensitivity analysis using the Ackley function. Lines and shaded areas denote mean $\pm$ 1 standard error. The no-harm guarantee ensures the convergence rate is on par with vanilla LCB even in adversarial cases. Handover guarantee ensures that $\mathcal{Q}^g_t$ plateau, allowing optimisation without expert intervention once sufficient information has been elicited.
  • Figure 4: Ablation study on five common synthetic functions with synthetic expert labels ($a = 1$).
  • Figure 5: Real-world experiments with four human experts of lithium-ion batteries.
  • ...and 7 more figures

Theorems & Definitions (18)

  • Example 2.2
  • Lemma 3.1: Theorem 2, chowdhury2017kernelized
  • Lemma 3.2: Likelihood-based confidence set
  • Remark 3.3: Choice of $\epsilon$
  • Remark 3.4: Confidence bound
  • Remark 3.5: Pointwise predictive interval estimation
  • Theorem 4.1
  • Lemma A.1
  • proof
  • Lemma A.2
  • ...and 8 more