Principled Bayesian Optimisation in Collaboration with Human Experts
Wenjie Xu, Masaki Adachi, Colin N. Jones, Michael A. Osborne
TL;DR
This paper presents COBOL, a principled Bayesian optimisation framework that collaborates with human experts via binary labels. It jointly models the objective with a Gaussian process and the expert belief with a likelihood-ratio surrogate, and mixes them through a primal-dual acquisition rule to adaptively trust expert input. The authors prove sublinear growth of required expert labels (handover) and a no-harm guarantee that convergence is at least as good as vanilla BO, even against adversarial advice. Empirically, COBOL accelerates convergence in battery-design tasks while remaining robust to varying labeling accuracy, and maintains practical computation times. The approach offers a data-driven mechanism to harness human expertise in challenging scientific optimisation problems, with avenues for extension to other feedback forms and integration with advanced models such as LLMs.
Abstract
Bayesian optimisation for real-world problems is often performed interactively with human experts, and integrating their domain knowledge is key to accelerate the optimisation process. We consider a setup where experts provide advice on the next query point through binary accept/reject recommendations (labels). Experts' labels are often costly, requiring efficient use of their efforts, and can at the same time be unreliable, requiring careful adjustment of the degree to which any expert is trusted. We introduce the first principled approach that provides two key guarantees. (1) Handover guarantee: similar to a no-regret property, we establish a sublinear bound on the cumulative number of experts' binary labels. Initially, multiple labels per query are needed, but the number of expert labels required asymptotically converges to zero, saving both expert effort and computation time. (2) No-harm guarantee with data-driven trust level adjustment: our adaptive trust level ensures that the convergence rate will not be worse than the one without using advice, even if the advice from experts is adversarial. Unlike existing methods that employ a user-defined function that hand-tunes the trust level adjustment, our approach enables data-driven adjustments. Real-world applications empirically demonstrate that our method not only outperforms existing baselines, but also maintains robustness despite varying labelling accuracy, in tasks of battery design with human experts.
