Cooperative Bayesian Optimization for Imperfect Agents
Ali Khoshvishkaie, Petrus Mikkola, Pierre-Alexandre Murena, Samuel Kaski
TL;DR
This paper studies cooperative Bayesian optimization where two agents—a human user and an AI agent—sequentially select a point in a two-coordinate space by each controlling one coordinate, with the goal of maximizing a noisy black-box function $f: \mathcal{X} \times \mathcal{Y} \to \mathbb{R}$. It proposes a model-based approach using Bayes Adaptive Monte Carlo Planning that reasons about a computationally rational user with a Gaussian Process prior for $f$, incorporating a conservative belief-update rule and a Thurstone-inspired model of user choices. The AI plans actions by simulating the user and optimizing a reward that balances the user’s expected performance with the AI’s knowledge of the function, enabling strategic planning to identify the global maximum more effectively than greedy or random strategies. Empirical validation on a Himmelblau-like benchmark shows the method yields higher optimization scores and robust improvements across varying user traits and priors, albeit with notable computational cost that motivates future work toward real-time applicability and scalability.
Abstract
We introduce a cooperative Bayesian optimization problem for optimizing black-box functions of two variables where two agents choose together at which points to query the function but have only control over one variable each. This setting is inspired by human-AI teamwork, where an AI-assistant helps its human user solve a problem, in this simplest case, collaborative optimization. We formulate the solution as sequential decision-making, where the agent we control models the user as a computationally rational agent with prior knowledge about the function. We show that strategic planning of the queries enables better identification of the global maximum of the function as long as the user avoids excessive exploration. This planning is made possible by using Bayes Adaptive Monte Carlo planning and by endowing the agent with a user model that accounts for conservative belief updates and exploratory sampling of the points to query.
