Bayesian Analysis of Combinatorial Gaussian Process Bandits

Jack Sandberg; Niklas Åkerblom; Morteza Haghir Chehreghani

Bayesian Analysis of Combinatorial Gaussian Process Bandits

Jack Sandberg, Niklas Åkerblom, Morteza Haghir Chehreghani

TL;DR

This work tackles Bayesian learning in the combinatorial volatile Gaussian process semi‑bandit, where the agent selects a subset (super arm) from a potentially infinite set of base arms each round and observes additive semi‑bandit rewards. It develops and analyzes GP‑UCB, GP‑TS, and the first regret bound for GP‑BayesUCB in this setting, providing finite and infinite arm results via discretization and exploiting the maximal information gain $\gamma_T$ and the kernel‑driven information structure. The authors prove sublinear Bayesian regret bounds of the form $\mathrm{BR}(T)=\mathcal{O}(\sqrt{\lambda^*_K T K \beta_T \gamma_{TK}})$ under suitable conditions and apply the framework to online energy‑efficient navigation on real road networks, demonstrating that TS‑based methods achieve lower regret and better exploration efficiency than UCB variants and Bayesian baselines. The practical impact lies in scalable, context‑aware planning under uncertainty for energy‑aware routing and related combinatorial decision problems.

Abstract

We consider the combinatorial volatile Gaussian process (GP) semi-bandit problem. Each round, an agent is provided a set of available base arms and must select a subset of them to maximize the long-term cumulative reward. We study the Bayesian setting and provide novel Bayesian cumulative regret bounds for three GP-based algorithms: GP-UCB, GP-BayesUCB and GP-TS. Our bounds extend previous results for GP-UCB and GP-TS to the infinite, volatile and combinatorial setting, and to the best of our knowledge, we provide the first regret bound for GP-BayesUCB. Volatile arms encompass other widely considered bandit problems such as contextual bandits. Furthermore, we employ our framework to address the challenging real-world problem of online energy-efficient navigation, where we demonstrate its effectiveness compared to the alternatives.

Bayesian Analysis of Combinatorial Gaussian Process Bandits

TL;DR

and the kernel‑driven information structure. The authors prove sublinear Bayesian regret bounds of the form

under suitable conditions and apply the framework to online energy‑efficient navigation on real road networks, demonstrating that TS‑based methods achieve lower regret and better exploration efficiency than UCB variants and Bayesian baselines. The practical impact lies in scalable, context‑aware planning under uncertainty for energy‑aware routing and related combinatorial decision problems.

Abstract

Paper Structure (31 sections, 20 theorems, 59 equations, 12 figures, 2 tables, 3 algorithms)

This paper contains 31 sections, 20 theorems, 59 equations, 12 figures, 2 tables, 3 algorithms.

Introduction
Setup and Algorithms
Problem formulation
Bayesian framework for combinatorial Gaussian process bandits
Information gain
Regret Analysis
Finite case
Infinite case
Experiments
Bandit formulation of online energy efficient navigation problem
The online energy-efficient navigation problem
Shortest paths with rectified Gaussians
GP regression for energy-efficient navigation
Bayesian inference for energy-efficient navigation
Real-world road networks
...and 16 more sections

Key Result

lemma 1

Let $C_\omega = \left( \sqrt{\pi} \omega / \sqrt{2e (\omega - 1)} \right)^{1/\omega}$, then for GP-BUCB with confidence parameter $\beta_t = 2 \left( \mathop{\mathrm{erf}}\nolimits^{-1}(1 - 2 \eta_t ) \right)^2$ and $\eta_t = \frac{\sqrt{2\pi}^\omega}{2|\mathcal{A}|^\omega t^\xi}$, $\xi > 0$, $\omeg

Figures (12)

Figure 1: Road networks of Luxembourg (left) and Monaco (right) with evaluation routes A and B highlighted in blue and red.
Figure 2: Cumulative regret for UCB, BUCB and TS using GP and Bayesian inference (BI) methods. The lines and regions correspond to the mean and $\pm 1$ standard error.
Figure 3: Cumulative regret of GP-UCB and GP-BUCB (left and middle column) for different parametrizations of $\beta_t$ (right column).
Figure 4: Cumulative regret of GP-BUCB, BI-BUCB, GP-TS and BI-TS for varying prior lengthscale values $\ell$.
Figure 5: Cumulative regret at $t = 500$ for varying prior lengthscale values. Errorbars correspond to $\pm 1$ standard error.
...and 7 more figures

Theorems & Definitions (38)

lemma 1
theorem 3.1: Finite regret bounds
lemma 2
theorem 3.2: Infinite regret bounds
lemma 3
proof
lemma 4
proof
lemma 4
proof
...and 28 more

Bayesian Analysis of Combinatorial Gaussian Process Bandits

TL;DR

Abstract

Bayesian Analysis of Combinatorial Gaussian Process Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (38)