High-dimensional Nonparametric Contextual Bandit Problem

Shogo Iwazaki; Junpei Komiyama; Masaaki Imaizumi

High-dimensional Nonparametric Contextual Bandit Problem

Shogo Iwazaki, Junpei Komiyama, Masaaki Imaizumi

TL;DR

The paper tackles high-dimensional kernelized contextual bandits by leveraging a kernel ridgeless interpolation estimator within an explore-then-commit framework, enabling sublinear regret under spectral context conditions. It introduces two kernel classes (inner-product and RBF) with principled scaling by $1/d$ and derives bias-variance bounds for the interpolator, establishing no-regret when the effective dimension grows with the sample size. It also provides lenient-regret guarantees for non-vanishing generalization error and demonstrates superior empirical performance over kernel-UCB baselines and linear methods on simulations and the Avazu CTR dataset. The work advances nonparametric, high-dimensional bandit learning by bridging kernel interpolation theory with decision-making, offering practical algorithms for nonlinear, high-dimensional contexts.

Abstract

We consider the kernelized contextual bandit problem with a large feature space. This problem involves $K$ arms, and the goal of the forecaster is to maximize the cumulative rewards through learning the relationship between the contexts and the rewards. It serves as a general framework for various decision-making scenarios, such as personalized online advertising and recommendation systems. Kernelized contextual bandits generalize the linear contextual bandit problem and offers a greater modeling flexibility. Existing methods, when applied to Gaussian kernels, yield a trivial bound of $O(T)$ when we consider $Ω(\log T)$ feature dimensions. To address this, we introduce stochastic assumptions on the context distribution and show that no-regret learning is achievable even when the number of dimensions grows up to the number of samples. Furthermore, we analyze lenient regret, which allows a per-round regret of at most $Δ> 0$. We derive the rate of lenient regret in terms of $Δ$.

High-dimensional Nonparametric Contextual Bandit Problem

TL;DR

and derives bias-variance bounds for the interpolator, establishing no-regret when the effective dimension grows with the sample size. It also provides lenient-regret guarantees for non-vanishing generalization error and demonstrates superior empirical performance over kernel-UCB baselines and linear methods on simulations and the Avazu CTR dataset. The work advances nonparametric, high-dimensional bandit learning by bridging kernel interpolation theory with decision-making, offering practical algorithms for nonlinear, high-dimensional contexts.

Abstract

We consider the kernelized contextual bandit problem with a large feature space. This problem involves

arms, and the goal of the forecaster is to maximize the cumulative rewards through learning the relationship between the contexts and the rewards. It serves as a general framework for various decision-making scenarios, such as personalized online advertising and recommendation systems. Kernelized contextual bandits generalize the linear contextual bandit problem and offers a greater modeling flexibility. Existing methods, when applied to Gaussian kernels, yield a trivial bound of

when we consider

feature dimensions. To address this, we introduce stochastic assumptions on the context distribution and show that no-regret learning is achievable even when the number of dimensions grows up to the number of samples. Furthermore, we analyze lenient regret, which allows a per-round regret of at most

. We derive the rate of lenient regret in terms of

High-dimensional Nonparametric Contextual Bandit Problem

TL;DR

Abstract

High-dimensional Nonparametric Contextual Bandit Problem

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (43)