Table of Contents
Fetching ...

Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-Making with High-Dimensional Covariates

Wenjia Wang, Qingwen Zhang, Xiaowei Zhang

TL;DR

SPARKLE addresses online decision-making with high-dimensional covariates and nonlinear rewards by enforcing a sparse additive RKHS reward model and employing a doubly penalized estimator. The epoch-based SPARKLE algorithm uses adaptive screening to balance exploration and exploitation, achieving a sublinear regret with a logarithmic dependence on the covariate dimension and a tunable dependence on smoothness via the RKHS parameter $m$. The authors establish both upper and information-theoretic lower bounds, show the key geometric $\mathscr{C}$-regularity of time-varying sample supports, and demonstrate strong empirical performance on synthetic data and real-world tasks like video recommendation and warfarin dosing. This work bridges parametric and nonparametric online learning, offering scalable, interpretable, and theoretically-grounded methods for complex, high-dimensional contextual bandits.

Abstract

Personalized services are central to today's digital economy, and their sequential decisions are often modeled as contextual bandits. Modern applications pose two main challenges: high-dimensional covariates and the need for nonparametric models to capture complex reward-covariate relationships. We propose a contextual bandit algorithm based on a sparse additive reward model that addresses both challenges through (i) a doubly penalized estimator for nonparametric reward estimation and (ii) an epoch-based design with adaptive screening to balance exploration and exploitation. We prove a sublinear regret bound that grows only logarithmically in the covariate dimensionality; to our knowledge, this is the first such result for nonparametric contextual bandits with high-dimensional covariates. We also derive an information-theoretic lower bound, and the gap to the upper bound vanishes as the reward smoothness increases. Extensive experiments on synthetic data and real data from video recommendation and personalized medicine show strong performance in high-dimensional settings.

Sparse Additive Contextual Bandits: A Nonparametric Approach for Online Decision-Making with High-Dimensional Covariates

TL;DR

SPARKLE addresses online decision-making with high-dimensional covariates and nonlinear rewards by enforcing a sparse additive RKHS reward model and employing a doubly penalized estimator. The epoch-based SPARKLE algorithm uses adaptive screening to balance exploration and exploitation, achieving a sublinear regret with a logarithmic dependence on the covariate dimension and a tunable dependence on smoothness via the RKHS parameter . The authors establish both upper and information-theoretic lower bounds, show the key geometric -regularity of time-varying sample supports, and demonstrate strong empirical performance on synthetic data and real-world tasks like video recommendation and warfarin dosing. This work bridges parametric and nonparametric online learning, offering scalable, interpretable, and theoretically-grounded methods for complex, high-dimensional contextual bandits.

Abstract

Personalized services are central to today's digital economy, and their sequential decisions are often modeled as contextual bandits. Modern applications pose two main challenges: high-dimensional covariates and the need for nonparametric models to capture complex reward-covariate relationships. We propose a contextual bandit algorithm based on a sparse additive reward model that addresses both challenges through (i) a doubly penalized estimator for nonparametric reward estimation and (ii) an epoch-based design with adaptive screening to balance exploration and exploitation. We prove a sublinear regret bound that grows only logarithmically in the covariate dimensionality; to our knowledge, this is the first such result for nonparametric contextual bandits with high-dimensional covariates. We also derive an information-theoretic lower bound, and the gap to the upper bound vanishes as the reward smoothness increases. Extensive experiments on synthetic data and real data from video recommendation and personalized medicine show strong performance in high-dimensional settings.

Paper Structure

This paper contains 51 sections, 19 theorems, 185 equations, 7 figures, 1 algorithm.

Key Result

Theorem 1

Suppose Assumptions assum_densitybound--assum_compatibility hold. Let $\widetilde{\Omega}\subseteq \Omega$ be $\mathscr{C}$-regular. Consider i.i.d. samples $\{(\widetilde{\bm{x}}_i, \widetilde{y}_i):i=1,\ldots,n\}$, where $\widetilde{\bm{x}}_i\sim {\mathbb{P}}_X(\cdot|\bm{x}\in\widetilde{\Omega})$ then with probability at least $1-2\delta$, then for some constants $C, C', C">0$ independent of $

Figures (7)

  • Figure 1: Small Support Gap Yields Large $L_\infty$ Error.
  • Figure 2: Evolution of Exploration and Exploitation Regions.
  • Figure 3: Pathological Behaviors Precluded by Assumption \ref{['assum:regularity']}.
  • Figure 4: Effect of $(d, T, s)$ on SPARKLE's Cumulative Regret.
  • Figure 5: Dependence of SPARKLE's Cumulative Regret on $(T, s)$ on the Log–Log Scale.
  • ...and 2 more figures

Theorems & Definitions (25)

  • Definition 1: $\mathscr{C}$-regularity
  • Theorem 1
  • Remark 1
  • Proposition 1
  • Corollary 1
  • Lemma 1
  • Lemma 2
  • Definition 2: Event $\mathcal{A}_{q}$
  • Proposition 2: Sufficient Data
  • Proposition 3: Well-Behaved Distribution
  • ...and 15 more