Cooperative Bayesian Optimization for Imperfect Agents

Ali Khoshvishkaie; Petrus Mikkola; Pierre-Alexandre Murena; Samuel Kaski

Cooperative Bayesian Optimization for Imperfect Agents

Ali Khoshvishkaie, Petrus Mikkola, Pierre-Alexandre Murena, Samuel Kaski

TL;DR

This paper studies cooperative Bayesian optimization where two agents—a human user and an AI agent—sequentially select a point in a two-coordinate space by each controlling one coordinate, with the goal of maximizing a noisy black-box function $f: \mathcal{X} \times \mathcal{Y} \to \mathbb{R}$. It proposes a model-based approach using Bayes Adaptive Monte Carlo Planning that reasons about a computationally rational user with a Gaussian Process prior for $f$, incorporating a conservative belief-update rule and a Thurstone-inspired model of user choices. The AI plans actions by simulating the user and optimizing a reward that balances the user’s expected performance with the AI’s knowledge of the function, enabling strategic planning to identify the global maximum more effectively than greedy or random strategies. Empirical validation on a Himmelblau-like benchmark shows the method yields higher optimization scores and robust improvements across varying user traits and priors, albeit with notable computational cost that motivates future work toward real-time applicability and scalability.

Abstract

We introduce a cooperative Bayesian optimization problem for optimizing black-box functions of two variables where two agents choose together at which points to query the function but have only control over one variable each. This setting is inspired by human-AI teamwork, where an AI-assistant helps its human user solve a problem, in this simplest case, collaborative optimization. We formulate the solution as sequential decision-making, where the agent we control models the user as a computationally rational agent with prior knowledge about the function. We show that strategic planning of the queries enables better identification of the global maximum of the function as long as the user avoids excessive exploration. This planning is made possible by using Bayes Adaptive Monte Carlo planning and by endowing the agent with a user model that accounts for conservative belief updates and exploratory sampling of the points to query.

Cooperative Bayesian Optimization for Imperfect Agents

TL;DR

. It proposes a model-based approach using Bayes Adaptive Monte Carlo Planning that reasons about a computationally rational user with a Gaussian Process prior for

, incorporating a conservative belief-update rule and a Thurstone-inspired model of user choices. The AI plans actions by simulating the user and optimizing a reward that balances the user’s expected performance with the AI’s knowledge of the function, enabling strategic planning to identify the global maximum more effectively than greedy or random strategies. Empirical validation on a Himmelblau-like benchmark shows the method yields higher optimization scores and robust improvements across varying user traits and priors, albeit with notable computational cost that motivates future work toward real-time applicability and scalability.

Abstract

Paper Structure (31 sections, 11 equations, 2 figures, 3 tables)

This paper contains 31 sections, 11 equations, 2 figures, 3 tables.

Introduction
Cooperative Bayesian Optimization
Problem Formulation
Mathematical Formalization
User Model
Implementation
Bayes Adaptive Monte Carlo Planning
User Model Specification
User's Knowledge.
Belief Update.
Decision-Making.
Summary: Definition of the User Model.
Inference of the User Model Parameters
Estimation of $(\alpha, \beta)$.
Estimation of $f_{um}$.
...and 16 more sections

Figures (2)

Figure 1: Interaction scenario between the user and the AI agent in the optimization task. Unlike a greedy agent (a), the AI agent we propose (b) has a model of the user and plans its actions by anticipating the user's behaviour. This results in a more efficient cooperative exploration of the domain, and therefore avoids getting stuck in a local optimum. This is visible in the right-hand side plots, showing the corresponding trajectories of queries to the function $f$.
Figure 2: Evolution of the optimization performance during the interaction. At the end of the interaction, our agent (StragicAI) gets better performance than other baselines. It performs slightly worse than the VanillaBO (GP-UCB), because, unlike this baseline, the StrategicAI does not have control over the full domain $\mathcal{X} \times \mathcal{Y}$.

Cooperative Bayesian Optimization for Imperfect Agents

TL;DR

Abstract

Cooperative Bayesian Optimization for Imperfect Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (2)