Table of Contents
Fetching ...

Sparse Nonparametric Contextual Bandits

Hamish Flynn, Julia Olkhovskaya, Paul Rognon-Vael

TL;DR

This work introduces sparse nonparametric contextual bandits, allowing an infinite candidate feature set while enforcing sparsity in the reward representation. It shows minimax lower bounds that scale polynomially with the number of actions, implying inherent difficulty when $K$ is large relative to the horizon $n$. To address this, the authors propose a sparsity-aware Feel-Good Thompson Sampling (FGTS) algorithm and prove regret bounds that nearly match the lower bounds up to logarithmic factors, with explicit dependence on the effective feature dimension $d_{\mathrm{eff}}$ or the ambient dimension $d$. Across kernelised (countable and uncountable sparsity) and neural bandit settings, sparsity yields improved regret bounds once the horizon is large enough relative to sparsity and action count, highlighting practical regimes where flexible nonparametric models benefit from feature selection. The results connect nonparametric learning, compressed sensing ideas, and Bayesian-like exploration to deliver near-optimal performance in highly overparameterised contextual bandits.

Abstract

We study the benefits of sparsity in nonparametric contextual bandit problems, in which the set of candidate features is countably or uncountably infinite. Our contribution is two-fold. First, using a novel reduction to sequences of multi-armed bandit problems, we provide lower bounds on the minimax regret, which show that polynomial dependence on the number of actions is generally unavoidable in this setting. Second, we show that a variant of the Feel-Good Thompson Sampling algorithm enjoys regret bounds that match our lower bounds up to logarithmic factors of the horizon, and have logarithmic dependence on the effective number of candidate features. When we apply our results to kernelised and neural contextual bandits, we find that sparsity enables better regret bounds whenever the horizon is large enough relative to the sparsity and the number of actions.

Sparse Nonparametric Contextual Bandits

TL;DR

This work introduces sparse nonparametric contextual bandits, allowing an infinite candidate feature set while enforcing sparsity in the reward representation. It shows minimax lower bounds that scale polynomially with the number of actions, implying inherent difficulty when is large relative to the horizon . To address this, the authors propose a sparsity-aware Feel-Good Thompson Sampling (FGTS) algorithm and prove regret bounds that nearly match the lower bounds up to logarithmic factors, with explicit dependence on the effective feature dimension or the ambient dimension . Across kernelised (countable and uncountable sparsity) and neural bandit settings, sparsity yields improved regret bounds once the horizon is large enough relative to sparsity and action count, highlighting practical regimes where flexible nonparametric models benefit from feature selection. The results connect nonparametric learning, compressed sensing ideas, and Bayesian-like exploration to deliver near-optimal performance in highly overparameterised contextual bandits.

Abstract

We study the benefits of sparsity in nonparametric contextual bandit problems, in which the set of candidate features is countably or uncountably infinite. Our contribution is two-fold. First, using a novel reduction to sequences of multi-armed bandit problems, we provide lower bounds on the minimax regret, which show that polynomial dependence on the number of actions is generally unavoidable in this setting. Second, we show that a variant of the Feel-Good Thompson Sampling algorithm enjoys regret bounds that match our lower bounds up to logarithmic factors of the horizon, and have logarithmic dependence on the effective number of candidate features. When we apply our results to kernelised and neural contextual bandits, we find that sparsity enables better regret bounds whenever the horizon is large enough relative to the sparsity and the number of actions.

Paper Structure

This paper contains 45 sections, 19 theorems, 130 equations, 3 tables, 1 algorithm.

Key Result

Theorem 3

Consider the sparse nonparametric contextual bandit problem with countable sparsity described in Section sec:setting. Let $\mathcal{A} = [K]$ for some $K \geq 2$ and assume that the noise variables are standard Gaussian, i.e. $\epsilon_t \sim \mathcal{N}(0, 1)$. Suppose that for some $\beta > 1$ and Instead, suppose that for some $\beta > 0$ and some integer $m \geq \lceil 1/\beta\rceil s^2K\exp(s

Theorems & Definitions (21)

  • Definition 1: Uniform decay.
  • Definition 2: Uniform Lipschitz continuity
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Lemma 7: Exercise 24.1 in lattimore2020bandit
  • Lemma 8: Regret decomposition
  • Lemma 9: Lower bound for sequences of $K$-armed bandits
  • Theorem 10: Theorem 1 in zhang2022feel
  • ...and 11 more