Sparse Nonparametric Contextual Bandits

Hamish Flynn; Julia Olkhovskaya; Paul Rognon-Vael

Sparse Nonparametric Contextual Bandits

Hamish Flynn, Julia Olkhovskaya, Paul Rognon-Vael

TL;DR

This work introduces sparse nonparametric contextual bandits, allowing an infinite candidate feature set while enforcing sparsity in the reward representation. It shows minimax lower bounds that scale polynomially with the number of actions, implying inherent difficulty when $K$ is large relative to the horizon $n$. To address this, the authors propose a sparsity-aware Feel-Good Thompson Sampling (FGTS) algorithm and prove regret bounds that nearly match the lower bounds up to logarithmic factors, with explicit dependence on the effective feature dimension $d_{\mathrm{eff}}$ or the ambient dimension $d$. Across kernelised (countable and uncountable sparsity) and neural bandit settings, sparsity yields improved regret bounds once the horizon is large enough relative to sparsity and action count, highlighting practical regimes where flexible nonparametric models benefit from feature selection. The results connect nonparametric learning, compressed sensing ideas, and Bayesian-like exploration to deliver near-optimal performance in highly overparameterised contextual bandits.

Abstract

We study the benefits of sparsity in nonparametric contextual bandit problems, in which the set of candidate features is countably or uncountably infinite. Our contribution is two-fold. First, using a novel reduction to sequences of multi-armed bandit problems, we provide lower bounds on the minimax regret, which show that polynomial dependence on the number of actions is generally unavoidable in this setting. Second, we show that a variant of the Feel-Good Thompson Sampling algorithm enjoys regret bounds that match our lower bounds up to logarithmic factors of the horizon, and have logarithmic dependence on the effective number of candidate features. When we apply our results to kernelised and neural contextual bandits, we find that sparsity enables better regret bounds whenever the horizon is large enough relative to the sparsity and the number of actions.

Sparse Nonparametric Contextual Bandits

TL;DR

Abstract

Sparse Nonparametric Contextual Bandits

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (21)