ExOSITO: Explainable Off-Policy Learning with Side Information for Intensive Care Unit Blood Test Orders
Zongliang Ji, Andre Carlos Kajdacsy-Balla Amaral, Anna Goldenberg, Rahul G. Krishnan
TL;DR
ExOSITO reframes ICU lab-test ordering as an off-policy causal contextual bandit problem with context $X$, actions $T$, and reward $Y$, leveraging offline data and side information from clinically validated rules. It combines a patient-status forecasting model $\phi$ (based on PatchTST) with a clinician-facing policy $\pi_\theta$ and a differentiable lab-order utility $g(t,x)$, regularized by a generalized propensity score $\hat{f}(t,x)$ estimated via conditional normalizing flows. Privileged clinical rules serve as side information to bound decisions and ensure reliable overlap, implemented through a Lagrangian optimization that enforces $\hat{f}(\pi(x),x) \ge \overline{\varepsilon}$. Empirical results on MIMIC-IV and HiRID show ExOSITO achieving greater information gain $\Delta X$ and lower costs than physician policies and prior RL approaches, while providing interpretable, rule-supported explanations for ICU lab-test ordering. The framework demonstrates potential for deployment as a safe, explainable decision-support tool that reduces unnecessary testing and hospital resource use.
Abstract
Ordering a minimal subset of lab tests for patients in the intensive care unit (ICU) can be challenging. Care teams must balance between ensuring the availability of the right information and reducing the clinical burden and costs associated with each lab test order. Most in-patient settings experience frequent over-ordering of lab tests, but are now aiming to reduce this burden on both hospital resources and the environment. This paper develops a novel method that combines off-policy learning with privileged information to identify the optimal set of ICU lab tests to order. Our approach, EXplainable Off-policy learning with Side Information for ICU blood Test Orders (ExOSITO) creates an interpretable assistive tool for clinicians to order lab tests by considering both the observed and predicted future status of each patient. We pose this problem as a causal bandit trained using offline data and a reward function derived from clinically-approved rules; we introduce a novel learning framework that integrates clinical knowledge with observational data to bridge the gap between the optimal and logging policies. The learned policy function provides interpretable clinical information and reduces costs without omitting any vital lab orders, outperforming both a physician's policy and prior approaches to this practical problem.
