PABBO: Preferential Amortized Black-Box Optimization
Xinyu Zhang, Daolang Huang, Samuel Kaski, Julien Martinelli
TL;DR
The paper tackles efficient optimization of latent user utilities from preferential feedback, a setting where traditional PBBO relies on expensive GP-based posterior inference. It introduces PABBO, an end-to-end amortized framework that jointly learns a surrogate and an acquisition function via a transformer-based neural process, trained with reinforcement learning and an auxiliary binary prediction loss. Empirical results show orders-of-magnitude faster inference and often improved accuracy across synthetic benchmarks, hyperparameter optimization, and human-preference datasets, with additional validation through ablations and batch extensions for higher-dimensional tasks. This approach enables real-time preferential optimization in interactive settings and offers a scalable alternative to GP-based methods, while highlighting areas for future work such as larger pretraining datasets and dimension-agnostic architectures.
Abstract
Preferential Bayesian Optimization (PBO) is a sample-efficient method to learn latent user utilities from preferential feedback over a pair of designs. It relies on a statistical surrogate model for the latent function, usually a Gaussian process, and an acquisition strategy to select the next candidate pair to get user feedback on. Due to the non-conjugacy of the associated likelihood, every PBO step requires a significant amount of computations with various approximate inference techniques. This computational overhead is incompatible with the way humans interact with computers, hindering the use of PBO in real-world cases. Building on the recent advances of amortized BO, we propose to circumvent this issue by fully amortizing PBO, meta-learning both the surrogate and the acquisition function. Our method comprises a novel transformer neural process architecture, trained using reinforcement learning and tailored auxiliary losses. On a benchmark composed of synthetic and real-world datasets, our method is several orders of magnitude faster than the usual Gaussian process-based strategies and often outperforms them in accuracy.
