Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

Nikola Pavlovic; Sudeep Salgia; Qing Zhao

Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

Nikola Pavlovic, Sudeep Salgia, Qing Zhao

TL;DR

This work develops the first algorithm that achieves the optimal regret order (as defined by centralized learning) with a communication cost that is sublinear in both $N$ and $T$.

Abstract

We consider distributed kernel bandits where $N$ agents aim to collaboratively maximize an unknown reward function that lies in a reproducing kernel Hilbert space. Each agent sequentially queries the function to obtain noisy observations at the query points. Agents can share information through a central server, with the objective of minimizing regret that is accumulating over time $T$ and aggregating over agents. We develop the first algorithm that achieves the optimal regret order (as defined by centralized learning) with a communication cost that is sublinear in both $N$ and $T$. The key features of the proposed algorithm are the uniform exploration at the local agents and shared randomness with the central server. Working together with the sparse approximation of the GP model, these two key components make it possible to preserve the learning rate of the centralized setting at a diminishing rate of communication.

Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

TL;DR

This work develops the first algorithm that achieves the optimal regret order (as defined by centralized learning) with a communication cost that is sublinear in both

and

Abstract

We consider distributed kernel bandits where

agents aim to collaboratively maximize an unknown reward function that lies in a reproducing kernel Hilbert space. Each agent sequentially queries the function to obtain noisy observations at the query points. Agents can share information through a central server, with the objective of minimizing regret that is accumulating over time

and aggregating over agents. We develop the first algorithm that achieves the optimal regret order (as defined by centralized learning) with a communication cost that is sublinear in both

and

. The key features of the proposed algorithm are the uniform exploration at the local agents and shared randomness with the central server. Working together with the sparse approximation of the GP model, these two key components make it possible to preserve the learning rate of the centralized setting at a diminishing rate of communication.

Paper Structure (15 sections, 8 theorems, 28 equations, 1 figure, 2 algorithms)

This paper contains 15 sections, 8 theorems, 28 equations, 1 figure, 2 algorithms.

Introduction
Distributed Kernel Bandits
Main Results
Related Work
Problem Formulation
GP Models
Sparse approximation of GP models
The DUETS Algorithm
Performance Analysis
Empirical Studies
Appendix A.
Proof of Theorem \ref{['Main_Theorem']}
Proof of Lemma \ref{['grid_bound']}
Proof of Lemma \ref{['lemma:epoch_number']}
Proof of Lemma \ref{['lemma:inducing_set_size']}

Key Result

Lemma 2.5

Vakili_Kernel_Simple_Regret Assume that bounded_RKHS_norm and Sub_Gaussian hold. Given a set of observations $\{\mathbf{X}_m,\mathbf{Y}_m\}$ as described above, such that the query points $\mathbf{X}_m$ are chosen independent of the noise sequence, then for a fixed $x\in \mathcal{X}$, the following where $\beta(\delta) =B+R\sqrt{\frac{2}{\lambda}\log{\left(\frac{2}{\delta}\right)}}$.

Figures (1)

Figure 1: Cumulative regret (Fig. (\ref{['fig:cosine_regret']}-\ref{['fig:hartmann_regret']}) and communication cost (\ref{['fig:cosine_comm']}-\ref{['fig:hartmann_comm']}) for all algorithms across different benchmark functions averaged over $5$ Monte Carlo runs. The shaded region represents error bars corresponding to one standard deviation. As seen from the above plots, DUETS obtains a superior performance, both in terms of regret and communication cost, over other algorithm across all functions.

Theorems & Definitions (13)

Lemma 2.5
Theorem 4.1
proof
Lemma 4.2
Lemma 4.3
Lemma 4.4
Lemma 4.5
proof
Definition A.1
Lemma A.2: Adapted from Calandriello_Sketching
...and 3 more

Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

TL;DR

Abstract

Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (13)