Generalized Kernelized Bandits: A Novel Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds
Authors
Alberto Maria Metelli, Simone Drago, Marco Mussi
Abstract
We study the regret minimization problem in the novel setting of generalized kernelized bandits (GKBs), where we optimize an unknown function belonging to a reproducing kernel Hilbert space (RKHS) having access to samples generated by an exponential family (EF) reward model whose mean is a non-linear function . This setting extends both kernelized bandits (KBs) and generalized linear bandits (GLBs), providing a unified view of both settings. We propose an optimistic regret minimization algorithm, GKB-UCB, and we explain why existing self-normalized concentration inequalities used for KBs and GLBs do not allow to provide tight regret guarantees. For this reason, we devise a novel self-normalized Bernstein-like dimension-free inequality that applies to a Hilbert space of functions with bounded norm, representing a contribution of independent interest. Based on it, we analyze GKB-UCB, deriving a regret bound of order , being the learning horizon, the maximal information gain, and a term characterizing the magnitude of the expected reward non-linearity. Our result is tight in its dependence on , , and for both KBs and GLBs. Finally, we present a tractable version GKB-UCB, Trac-GKB-UCB, which attains similar regret guarantees, and we discuss its time and space complexity.