Table of Contents
Fetching ...

Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds

Aya Kayal, Sattar Vakili, Laura Toni, Da-shan Shiu, Alberto Bernacchia

TL;DR

This work addresses Bayesian optimization with human preference feedback (BOHF), where only pairwise comparisons are revealed at each step. The authors introduce Multi-Round Learning from Preference-based Feedback (MR-LPF), a kernel-based algorithm that uses rounds to progressively reduce uncertainty and prune non-promising actions. They prove regret bounds of $\tilde{\mathcal{O}}(\sqrt{Γ(T)T})$ that remove the dependence on the curvature parameter $κ$ and align with the order of conventional BO, along with corresponding sample complexity guarantees. Empirical results on synthetic functions and a Yelp dataset validate the theoretical improvements, showing MR-LPF outperforms prior preferential-BBO methods and scales to real-world data. Overall, the paper demonstrates that nearly the same amount of preferential feedback as scalar feedback suffices to reach near-optimal solutions in BOHF, substantially narrowing the gap to conventional BO performance.

Abstract

Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of $\tilde{\mathcal{O}}(\sqrt{Γ(T)T})$, where $Γ(T)$ represents the maximum information gain$\unicode{x2014}$a kernel-specific complexity term$\unicode{x2014}$and $T$ is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BO$\unicode{x2014}$achieved with richer feedback models$\unicode{x2014}$are recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.

Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds

TL;DR

This work addresses Bayesian optimization with human preference feedback (BOHF), where only pairwise comparisons are revealed at each step. The authors introduce Multi-Round Learning from Preference-based Feedback (MR-LPF), a kernel-based algorithm that uses rounds to progressively reduce uncertainty and prune non-promising actions. They prove regret bounds of that remove the dependence on the curvature parameter and align with the order of conventional BO, along with corresponding sample complexity guarantees. Empirical results on synthetic functions and a Yelp dataset validate the theoretical improvements, showing MR-LPF outperforms prior preferential-BBO methods and scales to real-world data. Overall, the paper demonstrates that nearly the same amount of preferential feedback as scalar feedback suffices to reach near-optimal solutions in BOHF, substantially narrowing the gap to conventional BO performance.

Abstract

Bayesian optimization (BO) with preference-based feedback has recently garnered significant attention due to its emerging applications. We refer to this problem as Bayesian Optimization from Human Feedback (BOHF), which differs from conventional BO by learning the best actions from a reduced feedback model, where only the preference between two actions is revealed to the learner at each time step. The objective is to identify the best action using a limited number of preference queries, typically obtained through costly human feedback. Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms. In this work, within the same framework we develop tighter performance guarantees. Specifically, we derive regret bounds of , where represents the maximum information gaina kernel-specific complexity termand is the number of queries. Our results significantly improve upon existing bounds. Notably, for common kernels, we show that the order-optimal sample complexities of conventional BOachieved with richer feedback modelsare recovered. In other words, the same number of preferential samples as scalar-valued samples is sufficient to find a nearly optimal solution.

Paper Structure

This paper contains 38 sections, 8 theorems, 59 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Consider the BOHF framework described in Section BOHF and the MR-LPF algorithm presented in Algorithm alg1. For $\delta\in(0,1)$, in MR-LPF, let where, $B$ is the upper bound on the RKHS norm of $f$ given in Assumption ass:RKHS_norm, $L= \max_{x,x'\in\mathcal{X}} \dot{\mu}(h(x,x'))$, $\kappa_1=\kappa$ defined in Equation kappa, $\forall r>1, \kappa_r=6$, $\lambda$ is the regularization parameter

Figures (3)

  • Figure 1: Average Regret against $T$ with RKHS test functions (top row) and Ackley test function (bottom row). The shaded area represents the standard error.
  • Figure 2: Average regret against $T$ for the experiment with Yelp Open Dataset. The shaded area represents the standard error.
  • Figure 3: Plots of the utility function $f(x)$, the preference function $h(x,x') = f(x) - f(x')$, and the probability of preference $\mu(h(x,x'))$ for synthetic experiments. The rows correspond to: (1st row) SE kernel (RKHS), (2nd row) Matérn kernel with $\nu=2.5$ (RKHS), (3rd row) Matérn kernel with $\nu=1.5$ (RKHS), and (4th row) Ackley function.

Theorems & Definitions (12)

  • Theorem 4.1: Regret bound for MR-LPF
  • Remark 4.2
  • Remark 4.3
  • Corollary 4.4
  • Corollary 4.5
  • Remark 4.6
  • Theorem 4.7: Confidence Bounds
  • Theorem 1.1
  • Theorem 1.2
  • Definition 1.3
  • ...and 2 more