Table of Contents
Fetching ...

A Simple Reduction Scheme for Constrained Contextual Bandits with Adversarial Contexts via Regression

Dhruv Sarkar, Abhishek Sinha

TL;DR

The paper addresses constrained contextual bandits with adversarial contexts, introducing a simple, modular reduction that uses an online regression oracle to form surrogate rewards and an Inverse Gap Weighting policy within the SquareCB framework. A central regret decomposition inequality is developed via a Lyapunov drift analysis on the cumulative constraint, enabling a unified treatment of exploration, estimation error, and constraints. Under realizability and sublinear regression error $U_T$, the approach yields Regret and CCV bounds of $ ilde{O}(dK T U_T^{1/4})$-type forms across several feasibility benchmarks (feasible in expectation, Slater, almost surely feasible, and long-term budget), and extends naturally to knapsack and linear-constraint variants. The framework is distribution-free with adversarial contexts and continues operation after budget exhaustion, offering robust, practical guarantees for budgeted decision-making in non-stationary environments, while leveraging off-the-shelf regression oracles.

Abstract

We study constrained contextual bandits (CCB) with adversarially chosen contexts, where each action yields a random reward and incurs a random cost. We adopt the standard realizability assumption: conditioned on the observed context, rewards and costs are drawn independently from fixed distributions whose expectations belong to known function classes. We consider the continuing setting, in which the algorithm operates over the entire horizon even after the budget is exhausted. In this setting, the objective is to simultaneously control regret and cumulative constraint violation. Building on the seminal SquareCB framework of Foster et al. (2018), we propose a simple and modular algorithmic scheme that leverages online regression oracles to reduce the constrained problem to a standard unconstrained contextual bandit problem with adaptively defined surrogate reward functions. In contrast to most prior work on CCB, which focuses on stochastic contexts, our reduction yields improved guarantees for the more general adversarial context setting, together with a compact and transparent analysis.

A Simple Reduction Scheme for Constrained Contextual Bandits with Adversarial Contexts via Regression

TL;DR

The paper addresses constrained contextual bandits with adversarial contexts, introducing a simple, modular reduction that uses an online regression oracle to form surrogate rewards and an Inverse Gap Weighting policy within the SquareCB framework. A central regret decomposition inequality is developed via a Lyapunov drift analysis on the cumulative constraint, enabling a unified treatment of exploration, estimation error, and constraints. Under realizability and sublinear regression error , the approach yields Regret and CCV bounds of -type forms across several feasibility benchmarks (feasible in expectation, Slater, almost surely feasible, and long-term budget), and extends naturally to knapsack and linear-constraint variants. The framework is distribution-free with adversarial contexts and continues operation after budget exhaustion, offering robust, practical guarantees for budgeted decision-making in non-stationary environments, while leveraging off-the-shelf regression oracles.

Abstract

We study constrained contextual bandits (CCB) with adversarially chosen contexts, where each action yields a random reward and incurs a random cost. We adopt the standard realizability assumption: conditioned on the observed context, rewards and costs are drawn independently from fixed distributions whose expectations belong to known function classes. We consider the continuing setting, in which the algorithm operates over the entire horizon even after the budget is exhausted. In this setting, the objective is to simultaneously control regret and cumulative constraint violation. Building on the seminal SquareCB framework of Foster et al. (2018), we propose a simple and modular algorithmic scheme that leverages online regression oracles to reduce the constrained problem to a standard unconstrained contextual bandit problem with adaptively defined surrogate reward functions. In contrast to most prior work on CCB, which focuses on stochastic contexts, our reduction yields improved guarantees for the more general adversarial context setting, together with a compact and transparent analysis.
Paper Structure (28 sections, 3 theorems, 76 equations, 1 figure, 1 table, 2 algorithms)

This paper contains 28 sections, 3 theorems, 76 equations, 1 figure, 1 table, 2 algorithms.

Key Result

lemma 1

Fix any arbitrary $\hat{\bm{v}} \in \mathbb{R}^K$ and the parameter $\gamma >0$. Then, for the probability distribution $p =\textsf{IGW}_{\gamma}(\hat{\bm{v}})$ given in Definition igw-def, it holds that for any vector $\bm{v} \in \mathbb{R}^K$ and any distribution $\bm{\mu} \in \Delta_{K},$ we have

Figures (1)

  • Figure 1: A schematic of the proposed algorithmic scheme for the constrained contextual bandit ($\mathsf{CCB}$) problem. The numbers within the circles show the sequence of operations performed at any round $t \geq 1$. The variable $Q(t)$ denotes the $\mathsf{CCV}$ after round $t$ and $\mathsf{IGW}$ denotes inverse gap weighting.

Theorems & Definitions (10)

  • definition 1: Feasible in Expectation
  • definition 2: Feasible in Expectation with Slater's condition
  • definition 3: Almost Surely Feasible
  • definition 4: Long-term Budget Feasible
  • remark 1
  • definition 5: Inverse Gap Weighting foster2020beyond
  • lemma 1
  • theorem 2: Performance Bounds for Constrained Contextual Bandit
  • lemma 2
  • proof