Table of Contents
Fetching ...

Near-Optimal Algorithm for Non-Stationary Kernelized Bandits

Shogo Iwazaki, Shion Takeno

TL;DR

The first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Mat\'ern kernels is shown, which reveals that an existing optimization-based KB algorithm with slight modification is near-optimal.

Abstract

This paper studies a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization, where one seeks to minimize the regret under an unknown reward function that varies over time. In particular, we focus on a near-optimal algorithm whose regret upper bound matches the regret lower bound. For this goal, we show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Matérn kernels, which reveals that an existing optimization-based KB algorithm with slight modification is near-optimal. However, this existing algorithm suffers from feasibility issues due to its huge computational cost. Therefore, we propose a novel near-optimal algorithm called restarting phased elimination with random permutation (R-PERP), which bypasses the huge computational cost. A technical key point is the simple permutation procedures of query candidates, which enable us to derive a novel tighter confidence bound tailored to the non-stationary problems.

Near-Optimal Algorithm for Non-Stationary Kernelized Bandits

TL;DR

The first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Mat\'ern kernels is shown, which reveals that an existing optimization-based KB algorithm with slight modification is near-optimal.

Abstract

This paper studies a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization, where one seeks to minimize the regret under an unknown reward function that varies over time. In particular, we focus on a near-optimal algorithm whose regret upper bound matches the regret lower bound. For this goal, we show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Matérn kernels, which reveals that an existing optimization-based KB algorithm with slight modification is near-optimal. However, this existing algorithm suffers from feasibility issues due to its huge computational cost. Therefore, we propose a novel near-optimal algorithm called restarting phased elimination with random permutation (R-PERP), which bypasses the huge computational cost. A technical key point is the simple permutation procedures of query candidates, which enable us to derive a novel tighter confidence bound tailored to the non-stationary problems.

Paper Structure

This paper contains 34 sections, 15 theorems, 55 equations, 2 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Fix any $T \in \mathbb{N}_+$, $V_T > 0$, $B > 0$, and $\rho > 0$. Furthermore, assume $\mathcal{X} = [0, 1]^d$, and $(\epsilon_t)_{t \in \mathbb{N}_+}$ is the noise sequence of independent Gaussian random variables $\epsilon_t \sim \mathcal{N}(0, \rho^2)$ for all $t \in \mathbb{N}_+$. Here, $\widetilde{C}_{\mathrm{SE}}, C_{\mathrm{SE}}, \widetilde{C}_{\mathrm{Mat}}, C_{\mathrm{Mat}} > 0$ are cons

Figures (2)

  • Figure 1: Illustrative image of the R-PERP algorithm in the first batch at the first interval with $N_1^{(1)} = 5$. The standard PE algorithm (left) chooses the query points deterministically within each batch. Thus, intuitively, the environment can arbitrarily choose the reward function $f_{1,j}^{1}$ such that the learner's estimation error becomes large. The R-PERP algorithm (right) alleviates the effect of such worst-case selection of the reward functions by randomly permutating the query candidates of the standard PE.
  • Figure 2: Numerical experiment results with $5$ different seeds. The plots show the average cumulative regret, and the error bars represent one standard error. The left and right plots show the results with SE kernel and Matérn kernel with $\nu = 5/2$, respectively.

Theorems & Definitions (28)

  • Theorem 1: Lower bound
  • Lemma 2
  • Theorem 3: The modified version of OPKB algorithm with the restart-reset strategy.
  • Theorem 4: Confidence bounds for average functions.
  • Theorem 5: Regret upper bound of R-PERP
  • Remark 1
  • proof
  • Lemma 6
  • proof
  • Lemma 7: Theorem 3.1 in adamczak2016circular
  • ...and 18 more