Functional Sequential Treatment Allocation with Covariates
Anders Bredahl Kock, David Preinerstorfer, Bezirgen Veliyev
TL;DR
This work addresses sequential treatment allocation with covariates when the objective is a general functional $\mathsf{T}$ of the conditional outcome distribution rather than the mean. It introduces the Functional Upper Confidence Bound (F-UCB) policy with covariates, implemented via covariate-space binning to estimate conditional functionals $F^i(\cdot, x)$ within each bin and select arms by maximizing $\mathsf{T}(F^i(\cdot, x))$. Under Hölder equicontinuity of the conditional distributions and a margin condition, the authors prove sublinear, near-minimax regret bounds and show that ignoring covariates leads to linear regret. The results extend prior functional-target bandit theory to covariate settings, providing adaptivity to arm similarity and ethical guarantees on exploration, with lower bounds matching the upper bounds up to logarithmic factors.
Abstract
We consider a multi-armed bandit problem with covariates. Given a realization of the covariate vector, instead of targeting the treatment with highest conditional expectation, the decision maker targets the treatment which maximizes a general functional of the conditional potential outcome distribution, e.g., a conditional quantile, trimmed mean, or a socio-economic functional such as an inequality, welfare or poverty measure. We develop expected regret lower bounds for this problem, and construct a near minimax optimal assignment policy.
