Table of Contents
Fetching ...

PABBO: Preferential Amortized Black-Box Optimization

Xinyu Zhang, Daolang Huang, Samuel Kaski, Julien Martinelli

TL;DR

The paper tackles efficient optimization of latent user utilities from preferential feedback, a setting where traditional PBBO relies on expensive GP-based posterior inference. It introduces PABBO, an end-to-end amortized framework that jointly learns a surrogate and an acquisition function via a transformer-based neural process, trained with reinforcement learning and an auxiliary binary prediction loss. Empirical results show orders-of-magnitude faster inference and often improved accuracy across synthetic benchmarks, hyperparameter optimization, and human-preference datasets, with additional validation through ablations and batch extensions for higher-dimensional tasks. This approach enables real-time preferential optimization in interactive settings and offers a scalable alternative to GP-based methods, while highlighting areas for future work such as larger pretraining datasets and dimension-agnostic architectures.

Abstract

Preferential Bayesian Optimization (PBO) is a sample-efficient method to learn latent user utilities from preferential feedback over a pair of designs. It relies on a statistical surrogate model for the latent function, usually a Gaussian process, and an acquisition strategy to select the next candidate pair to get user feedback on. Due to the non-conjugacy of the associated likelihood, every PBO step requires a significant amount of computations with various approximate inference techniques. This computational overhead is incompatible with the way humans interact with computers, hindering the use of PBO in real-world cases. Building on the recent advances of amortized BO, we propose to circumvent this issue by fully amortizing PBO, meta-learning both the surrogate and the acquisition function. Our method comprises a novel transformer neural process architecture, trained using reinforcement learning and tailored auxiliary losses. On a benchmark composed of synthetic and real-world datasets, our method is several orders of magnitude faster than the usual Gaussian process-based strategies and often outperforms them in accuracy.

PABBO: Preferential Amortized Black-Box Optimization

TL;DR

The paper tackles efficient optimization of latent user utilities from preferential feedback, a setting where traditional PBBO relies on expensive GP-based posterior inference. It introduces PABBO, an end-to-end amortized framework that jointly learns a surrogate and an acquisition function via a transformer-based neural process, trained with reinforcement learning and an auxiliary binary prediction loss. Empirical results show orders-of-magnitude faster inference and often improved accuracy across synthetic benchmarks, hyperparameter optimization, and human-preference datasets, with additional validation through ablations and batch extensions for higher-dimensional tasks. This approach enables real-time preferential optimization in interactive settings and offers a scalable alternative to GP-based methods, while highlighting areas for future work such as larger pretraining datasets and dimension-agnostic architectures.

Abstract

Preferential Bayesian Optimization (PBO) is a sample-efficient method to learn latent user utilities from preferential feedback over a pair of designs. It relies on a statistical surrogate model for the latent function, usually a Gaussian process, and an acquisition strategy to select the next candidate pair to get user feedback on. Due to the non-conjugacy of the associated likelihood, every PBO step requires a significant amount of computations with various approximate inference techniques. This computational overhead is incompatible with the way humans interact with computers, hindering the use of PBO in real-world cases. Building on the recent advances of amortized BO, we propose to circumvent this issue by fully amortizing PBO, meta-learning both the surrogate and the acquisition function. Our method comprises a novel transformer neural process architecture, trained using reinforcement learning and tailored auxiliary losses. On a benchmark composed of synthetic and real-world datasets, our method is several orders of magnitude faster than the usual Gaussian process-based strategies and often outperforms them in accuracy.

Paper Structure

This paper contains 37 sections, 5 equations, 17 figures, 1 table, 2 algorithms.

Figures (17)

  • Figure 1: Leveraging synthetic and existing BO datasets, Pabbo learns an acquisition policy in an end-to-end manner, thus directly amortizing the design proposal step. Contrary to existing methods, at inference time, Pabbo only relies on preference data. Our method outperforms GP-based strategies and dramatically speeds up the optimization.
  • Figure 2: Flowchart of Pabbo (left) , together with a zoom on our proposed transformer block (right). We use reinforcement learning to guide the acquisition head in proposing valuable query pairs, and apply an auxiliary BCE loss to the prediction head to stabilize the training of the transformer.
  • Figure 3: Simple regret and inference time on synthetic examples. Mean with 95% confidence intervals computed across 30 runs with random starting pairs. Pabbo consistently achieved the lowest simple regret across all tasks, except for GP cases where it performs comparably to qNEI, while offering a 10$\times$ speedup.
  • Figure 4: Simple regret on different search spaces from the HPO-B benchmark and human preferences datasets. Mean with 95% confidence intervals computed across 30 runs with random starting pairs. Attached to each Pabbo baseline is a number corresponding to $S$, the size of the query set. On average, Pabbo ranks first on HPO-B tasks and second for human preferences datasets.
  • Figure 5: Ablation studies. Mean with 95% confidence intervals computed across 30 runs with random starting pairs. On low dimensional examples, Pabbo maintains similar performance when substituting latent function values for rankings, and performs as expected when varying the discount factor $\gamma$ and query set size $S$.
  • ...and 12 more figures