HR-Bandit: Human-AI Collaborated Linear Recourse Bandit
Junyu Cao, Ruijiang Gao, Esmaeil Keyvanshokooh
TL;DR
The paper addresses online decision-making with actionable recourse in contextual bandits by introducing Recourse Linear UCB (RLinUCB) and its human–AI extension HR-Bandit. It combines a linear reward model with distance-bounded mutable feature adjustments and develops optimistic counterfactual optimization solved via ADMM, achieving sublinear regret. HR-Bandit adds a principled, data-driven human-in-the-loop mechanism that yields warm-start, robustness, and finite human-effort guarantees, while maintaining competitive regret bounds even with adversarial human input. Empirical results on synthetic and semi-synthetic healthcare data validate improved performance and limited human interventions, supporting the practicality of human–AI collaboration in medical recourse. The work advances online recourse methods and offers a foundation for safe, efficient human-guided bandits in clinical decision-support contexts.
Abstract
Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB ($\textsf{RLinUCB}$) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit ($\textsf{HR-Bandit}$), which integrates human expertise to enhance performance. $\textsf{HR-Bandit}$ offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.
