Table of Contents
Fetching ...

Mean-Field Bayesian Optimisation

Petar Steinberg, Juliusz Ziomek, Matej Jusup, Ilija Bogunovic

TL;DR

This work tackles optimizing the average payoff in a large population of cooperative agents when the payoff is an unknown black-box. It introduces MF-GP-UCB, a mean-field Bayesian optimisation algorithm that exploits permutation invariance to achieve regret independent of the number of agents, backed by theoretical analysis and information-theoretic bounds. The authors demonstrate strong empirical performance on synthetic tasks and real-world problems, including bike-sharing, taxi fleet distribution, and maritime refuelling, highlighting substantial gains in scalability and solution quality. They implement MF-GP-UCB within existing BO frameworks, discuss centralised control as a limitation, and outline directions toward decentralised, communication-free operation for broader applicability.

Abstract

We address the problem of optimising the average payoff for a large number of cooperating agents, where the payoff function is unknown and treated as a black box. While standard Bayesian Optimisation (BO) methods struggle with the scalability required for high-dimensional input spaces, we demonstrate how leveraging the mean-field assumption on the black-box function can transform BO into an efficient and scalable solution. Specifically, we introduce MF-GP-UCB, a novel efficient algorithm designed to optimise agent payoffs in this setting. Our theoretical analysis establishes a regret bound for MF-GP-UCB that is independent of the number of agents, contrasting sharply with the exponential dependence observed when naive BO methods are applied. We evaluate our algorithm on a diverse set of tasks, including real-world problems, such as optimising the location of public bikes for a bike-sharing programme, distributing taxi fleets, and selecting refuelling ports for maritime vessels. Empirical results demonstrate that MF-GP-UCB significantly outperforms existing benchmarks, offering substantial improvements in performance and scalability, constituting a promising solution for mean-field, black-box optimisation. The code is available at https://github.com/petarsteinberg/MF-BO.

Mean-Field Bayesian Optimisation

TL;DR

This work tackles optimizing the average payoff in a large population of cooperative agents when the payoff is an unknown black-box. It introduces MF-GP-UCB, a mean-field Bayesian optimisation algorithm that exploits permutation invariance to achieve regret independent of the number of agents, backed by theoretical analysis and information-theoretic bounds. The authors demonstrate strong empirical performance on synthetic tasks and real-world problems, including bike-sharing, taxi fleet distribution, and maritime refuelling, highlighting substantial gains in scalability and solution quality. They implement MF-GP-UCB within existing BO frameworks, discuss centralised control as a limitation, and outline directions toward decentralised, communication-free operation for broader applicability.

Abstract

We address the problem of optimising the average payoff for a large number of cooperating agents, where the payoff function is unknown and treated as a black box. While standard Bayesian Optimisation (BO) methods struggle with the scalability required for high-dimensional input spaces, we demonstrate how leveraging the mean-field assumption on the black-box function can transform BO into an efficient and scalable solution. Specifically, we introduce MF-GP-UCB, a novel efficient algorithm designed to optimise agent payoffs in this setting. Our theoretical analysis establishes a regret bound for MF-GP-UCB that is independent of the number of agents, contrasting sharply with the exponential dependence observed when naive BO methods are applied. We evaluate our algorithm on a diverse set of tasks, including real-world problems, such as optimising the location of public bikes for a bike-sharing programme, distributing taxi fleets, and selecting refuelling ports for maritime vessels. Empirical results demonstrate that MF-GP-UCB significantly outperforms existing benchmarks, offering substantial improvements in performance and scalability, constituting a promising solution for mean-field, black-box optimisation. The code is available at https://github.com/petarsteinberg/MF-BO.

Paper Structure

This paper contains 20 sections, 1 theorem, 34 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Theorem 4.1

Let Assumption as:lkernel hold, and run alg:ucb_mf_bo_centralised for $T$ rounds and set $\beta_t = 2\log(|A||C||\Xi_t|t^2 / \sqrt{2\pi})$, where: then we have that: where $\mathcal{B} = \sqrt{2|A|^3|C|}\min\{(b(\log(a|A||C|) + \sqrt{\pi} / 2))^{-1},|A||C|\}$ and $\gamma_T$ is the maximum information gain of the kernel.

Figures (5)

  • Figure 1: MF-GP-UCB utilises invariance under permutation of actions to achieve a regret bound independent of the number of agents. Each row on the left represents a set of actions of the four agents, with the action of the representative agent (RA) in orange and the others in blue. Top: For a fixed RA action $x_1$, the function $f_A$ maps distinct permutations of action vector $\mathbf{x}$ to different values. Bottom: The function $f_B$ utilises the mean-field assumption, which converts permutations of action vector $\mathbf{x}$ into an identical distribution $\xi$, to output a single value for the RA given the observed distribution.
  • Figure 2: MF-GP-UCB is superior in both sample efficiency and solution quality compared to the benchmarks over a range of black-box dimensions $M$. When the black box satisfies the mean-field assumption, our algorithm inevitably has the advantage by optimising over the distribution of actions instead of over the interactions of individual actions.
  • Figure 3: Arena Histogram -- a visual example of a solution suggested by MF-GP-UCB. The reward term encourages separating the supporters of the two teams/contexts around the pitch, while the penalty term ensures there is no extreme congregation in just two directly opposite seating areas. The congestion factor $\sigma$ controls the "smoothness" of the optimal histograms. This solution produced a black-box reward value $r(\mathbf{x})\approx0.9$.
  • Figure 4: Manhattan Island -- NYC Taxi ground truth demand in a $12\times20$ heatmap and best solutions found by respective algorithms. The solution by MF-GP-UCB is clearly closer to the ground truth than TuRBO, which is closer to random. This is further confirmed in \ref{['fig:NYC-12x20-']}. Leveraging the mean-field assumption gives MF-GP-UCB the advantage, while TuRBO is optimising individual actions over $M=20,000$ dimensions and thus not getting much further than uniformity in $T=250$ iterations.
  • Figure 5: NYC Taxi -- the scaling power of MF-GP-UCB is made more apparent in this experiment: where TuRBO struggles to do much better than random in the 250 iteration regime, our algorithm finds great solutions in all three grid sizes. The black-box dimension of this experiment is $M=20,000$ with the denoted action spaces and a single context $|C|=1$. This experiment is a scaled-up version of the LouVelo bike-sharing function.

Theorems & Definitions (3)

  • Theorem 4.1
  • proof
  • proof