Table of Contents
Fetching ...

Batched Kernelized Bandits: Refinements and Extensions

Chenkai Ma, Keqin Chen, Jonathan Scarlett

Abstract

In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We refer to this as the Batched Kernelized Bandits problem, and refine and extend existing results on regret bounds. For algorithmic upper bounds, (Li and Scarlett, 2022) shows that $B=O(\log\log T)$ batches suffice to attain near-optimal regret, where $T$ is the time horizon and $B$ is the number of batches. We further refine this by (i) finding the optimal number of batches including constant factors (to within $1+o(1)$), and (ii) removing a factor of $B$ in the regret bound. For algorithm-independent lower bounds, noticing that existing results only apply when the batch sizes are fixed in advance, we present novel lower bounds when the batch sizes are chosen adaptively, and show that adaptive batches have essentially same minimax regret scaling as fixed batches. Furthermore, we consider a robust setting where the goal is to choose points for which the function value remains high even after an adversarial perturbation. We present the robust-BPE algorithm, and show that a suitably-defined cumulative regret notion incurs the same bound as the non-robust setting, and derive a simple regret bound significantly below that of previous work.

Batched Kernelized Bandits: Refinements and Extensions

Abstract

In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We refer to this as the Batched Kernelized Bandits problem, and refine and extend existing results on regret bounds. For algorithmic upper bounds, (Li and Scarlett, 2022) shows that batches suffice to attain near-optimal regret, where is the time horizon and is the number of batches. We further refine this by (i) finding the optimal number of batches including constant factors (to within ), and (ii) removing a factor of in the regret bound. For algorithm-independent lower bounds, noticing that existing results only apply when the batch sizes are fixed in advance, we present novel lower bounds when the batch sizes are chosen adaptively, and show that adaptive batches have essentially same minimax regret scaling as fixed batches. Furthermore, we consider a robust setting where the goal is to choose points for which the function value remains high even after an adversarial perturbation. We present the robust-BPE algorithm, and show that a suitably-defined cumulative regret notion incurs the same bound as the non-robust setting, and derive a simple regret bound significantly below that of previous work.
Paper Structure (31 sections, 24 theorems, 96 equations, 2 figures, 3 tables, 2 algorithms)

This paper contains 31 sections, 24 theorems, 96 equations, 2 figures, 3 tables, 2 algorithms.

Key Result

Lemma 1

Li22 Under the setup of Section sec:setup_standard and using the batch sizes defined in eq:grow_b_batch_sizes_li, for any $\delta \in (0,1)$, the BPE algorithm yields with probability at least $1-\delta$ that where $\Lambda=\Psi+\sqrt{\log(|\mathcal{X}|/\delta)}$.

Figures (2)

  • Figure 1: Illustration of a class of hard-to-distinguish functions $\mathcal{F}$, where any $x\in\mathcal{X}$ can be $\epsilon$-optimal for at most one bump function. This is an "idealized" illustration, with the actual functions used having infinite support but steady decay to zero.
  • Figure 2: Synthetic functions sampled using different kernels.

Theorems & Definitions (51)

  • Lemma 1
  • Remark 1
  • Theorem 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Lemma 2
  • Theorem 2
  • Remark 5
  • Remark 6
  • ...and 41 more