Table of Contents
Fetching ...

Functional Sequential Treatment Allocation with Covariates

Anders Bredahl Kock, David Preinerstorfer, Bezirgen Veliyev

TL;DR

This work addresses sequential treatment allocation with covariates when the objective is a general functional $\mathsf{T}$ of the conditional outcome distribution rather than the mean. It introduces the Functional Upper Confidence Bound (F-UCB) policy with covariates, implemented via covariate-space binning to estimate conditional functionals $F^i(\cdot, x)$ within each bin and select arms by maximizing $\mathsf{T}(F^i(\cdot, x))$. Under Hölder equicontinuity of the conditional distributions and a margin condition, the authors prove sublinear, near-minimax regret bounds and show that ignoring covariates leads to linear regret. The results extend prior functional-target bandit theory to covariate settings, providing adaptivity to arm similarity and ethical guarantees on exploration, with lower bounds matching the upper bounds up to logarithmic factors.

Abstract

We consider a multi-armed bandit problem with covariates. Given a realization of the covariate vector, instead of targeting the treatment with highest conditional expectation, the decision maker targets the treatment which maximizes a general functional of the conditional potential outcome distribution, e.g., a conditional quantile, trimmed mean, or a socio-economic functional such as an inequality, welfare or poverty measure. We develop expected regret lower bounds for this problem, and construct a near minimax optimal assignment policy.

Functional Sequential Treatment Allocation with Covariates

TL;DR

This work addresses sequential treatment allocation with covariates when the objective is a general functional of the conditional outcome distribution rather than the mean. It introduces the Functional Upper Confidence Bound (F-UCB) policy with covariates, implemented via covariate-space binning to estimate conditional functionals within each bin and select arms by maximizing . Under Hölder equicontinuity of the conditional distributions and a margin condition, the authors prove sublinear, near-minimax regret bounds and show that ignoring covariates leads to linear regret. The results extend prior functional-target bandit theory to covariate settings, providing adaptivity to arm similarity and ethical guarantees on exploration, with lower bounds matching the upper bounds up to logarithmic factors.

Abstract

We consider a multi-armed bandit problem with covariates. Given a realization of the covariate vector, instead of targeting the treatment with highest conditional expectation, the decision maker targets the treatment which maximizes a general functional of the conditional potential outcome distribution, e.g., a conditional quantile, trimmed mean, or a socio-economic functional such as an inequality, welfare or poverty measure. We develop expected regret lower bounds for this problem, and construct a near minimax optimal assignment policy.

Paper Structure

This paper contains 17 sections, 12 theorems, 139 equations, 1 algorithm.

Key Result

Theorem 2.5

Suppose $K = 2$ and that Assumption as:lbcov is satisfied. Then there exists a constant $c_l > 0$, such that for every policy $\pi$ and any randomization measure, we have where the supremum is taken over all $(Y_t, X_t) \sim \mathbb{P}_{Y, X}$ for $t = 1, \hdots, n$, where $\mathbb{P}_{Y, X}$ satisfies Equations eqn:marginDincls and eqn:marginUEC, and where $\mathbb{P}_X$ is the uniform distribut

Theorems & Definitions (18)

  • Theorem 2.5
  • Theorem 2.7
  • Theorem 3.1
  • Corollary 3.2
  • Theorem 3.3
  • Remark 3.4: Unknown horizon and the doubling trick
  • Remark 3.5: Discrete covariates
  • Theorem 3.7
  • Theorem 3.8
  • Theorem 3.9
  • ...and 8 more