Table of Contents
Fetching ...

HR-Bandit: Human-AI Collaborated Linear Recourse Bandit

Junyu Cao, Ruijiang Gao, Esmaeil Keyvanshokooh

TL;DR

The paper addresses online decision-making with actionable recourse in contextual bandits by introducing Recourse Linear UCB (RLinUCB) and its human–AI extension HR-Bandit. It combines a linear reward model with distance-bounded mutable feature adjustments and develops optimistic counterfactual optimization solved via ADMM, achieving sublinear regret. HR-Bandit adds a principled, data-driven human-in-the-loop mechanism that yields warm-start, robustness, and finite human-effort guarantees, while maintaining competitive regret bounds even with adversarial human input. Empirical results on synthetic and semi-synthetic healthcare data validate improved performance and limited human interventions, supporting the practicality of human–AI collaboration in medical recourse. The work advances online recourse methods and offers a foundation for safe, efficient human-guided bandits in clinical decision-support contexts.

Abstract

Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB ($\textsf{RLinUCB}$) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit ($\textsf{HR-Bandit}$), which integrates human expertise to enhance performance. $\textsf{HR-Bandit}$ offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.

HR-Bandit: Human-AI Collaborated Linear Recourse Bandit

TL;DR

The paper addresses online decision-making with actionable recourse in contextual bandits by introducing Recourse Linear UCB (RLinUCB) and its human–AI extension HR-Bandit. It combines a linear reward model with distance-bounded mutable feature adjustments and develops optimistic counterfactual optimization solved via ADMM, achieving sublinear regret. HR-Bandit adds a principled, data-driven human-in-the-loop mechanism that yields warm-start, robustness, and finite human-effort guarantees, while maintaining competitive regret bounds even with adversarial human input. Empirical results on synthetic and semi-synthetic healthcare data validate improved performance and limited human interventions, supporting the practicality of human–AI collaboration in medical recourse. The work advances online recourse methods and offers a foundation for safe, efficient human-guided bandits in clinical decision-support contexts.

Abstract

Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB () algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit (), which integrates human expertise to enhance performance. offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.

Paper Structure

This paper contains 17 sections, 8 theorems, 66 equations, 8 figures, 3 algorithms.

Key Result

Lemma 1

Let $d(x,x') = \|x-x'\|_2$ be the distance function, then the optimal solution to E. optimization is $\check{x}_M^*=x_M+\theta_{a,M}^*/\|\theta_{a,M}^*\|\gamma$ for a given action $a$.

Figures (8)

  • Figure 1: Illustration of Recourse Bandit and HR-Bandit. Recourse Bandit offers algorithmic recourses to patients to improve their health conditions for more effective treatment. HR-Bandit selectively consults human experts based on uncertainty estimates, enabling a data-driven integration of human expertise and AI recommendations.
  • Figure 2: Synthetic Data.
  • Figure 3: Fertility Data.
  • Figure 4: Impact of $\zeta$ (Human Trust).
  • Figure 5: Impact of $\Delta$ (Human Effort).
  • ...and 3 more figures

Theorems & Definitions (14)

  • Lemma 1: Closed-form Solution for Two-norm Distance
  • Lemma 2: Closed-form Solution for Box Constraint
  • Theorem 1: Regret of RLinUCB
  • Theorem 2: Warm-Start Guarantee
  • Theorem 3: Human-Effort Guarantee
  • Theorem 4: Improvement Guarantee
  • Theorem 5: Robustness Guarantee
  • proof : Proof of Lemma \ref{['lemma: optimal solution']}
  • Lemma 3
  • proof : Proof of Theorem \ref{['thm: regret']}.
  • ...and 4 more