Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards

Marco Mussi; Simone Drago; Alberto Maria Metelli

Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards

Marco Mussi, Simone Drago, Alberto Maria Metelli

TL;DR

The paper tackles the absence of tight theoretical guarantees for kernelized bandits with Bernoulli rewards within the RKHS framework, where $f:\mathcal{X} \rightarrow [0,1]$ and $f \in \mathcal{H}_{k}$ with $\|f\|_{\mathcal{H}_{k}} \le B$. It surveys results for subgaussian noise and Bernoulli multi-armed bandits and clarifies that the kernelized Bernoulli case remains open. It articulates three open problems—Estimation, Concentration, and Regret Minimization—and discusses challenges around Bayesian updates (e.g., GP-based or Beta-process approaches), concentration inequalities, and instance-dependent regret, connecting to KL-based bounds such as KL-UCB. By mapping the landscape and proposing concrete research directions, the work aims to guide theory and practice for kernelized binary feedback.

Abstract

We consider Kernelized Bandits (KBs) to optimize a function $f : \mathcal{X} \rightarrow [0,1]$ belonging to the Reproducing Kernel Hilbert Space (RKHS) $\mathcal{H}_k$. Mainstream works on kernelized bandits focus on a subgaussian noise model in which observations of the form $f(\mathbf{x}_t)+ε_t$, being $ε_t$ a subgaussian noise, are available (Chowdhury and Gopalan, 2017). Differently, we focus on the case in which we observe realizations $y_t \sim \text{Ber}(f(\mathbf{x}_t))$ sampled from a Bernoulli distribution with parameter $f(\mathbf{x}_t)$. While the Bernoulli model has been investigated successfully in multi-armed bandits (Garivier and Cappé, 2011), logistic bandits (Faury et al., 2022), bandits in metric spaces (Magureanu et al., 2014), it remains an open question whether tight results can be obtained for KBs. This paper aims to draw the attention of the online learning community to this open problem.

Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards

TL;DR

The paper tackles the absence of tight theoretical guarantees for kernelized bandits with Bernoulli rewards within the RKHS framework, where

and

with

. It surveys results for subgaussian noise and Bernoulli multi-armed bandits and clarifies that the kernelized Bernoulli case remains open. It articulates three open problems—Estimation, Concentration, and Regret Minimization—and discusses challenges around Bayesian updates (e.g., GP-based or Beta-process approaches), concentration inequalities, and instance-dependent regret, connecting to KL-based bounds such as KL-UCB. By mapping the landscape and proposing concrete research directions, the work aims to guide theory and practice for kernelized binary feedback.

Abstract

We consider Kernelized Bandits (KBs) to optimize a function

belonging to the Reproducing Kernel Hilbert Space (RKHS)

. Mainstream works on kernelized bandits focus on a subgaussian noise model in which observations of the form

, being

a subgaussian noise, are available (Chowdhury and Gopalan, 2017). Differently, we focus on the case in which we observe realizations

sampled from a Bernoulli distribution with parameter

. While the Bernoulli model has been investigated successfully in multi-armed bandits (Garivier and Cappé, 2011), logistic bandits (Faury et al., 2022), bandits in metric spaces (Magureanu et al., 2014), it remains an open question whether tight results can be obtained for KBs. This paper aims to draw the attention of the online learning community to this open problem.

Paper Structure (6 sections, 7 equations, 1 figure, 1 table)

This paper contains 6 sections, 7 equations, 1 figure, 1 table.

Introduction
Problem Formulation
Open Problems
Open Problem 1: Estimation
Open Problem 2: Concentration
Open Problem 3: Regret Minimization

Figures (1)

Figure 1: GP estimate example of $f \in [0,1]$.

Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards

TL;DR

Abstract

Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards

Authors

TL;DR

Abstract

Table of Contents

Figures (1)