Table of Contents
Fetching ...

Safe and Interpretable Estimation of Optimal Treatment Regimes

Harsh Parikh, Quinn Lanners, Zade Akras, Sahar F. Zafar, M. Brandon Westover, Cynthia Rudin, Alexander Volfovsky

TL;DR

This work develops a safe, interpretable framework to estimate patient-specific optimal treatment regimes for seizures in critically ill patients by integrating mechanistic PK/PD modeling, distance-metric learning, and interpolation across matched patients. It addresses challenges typical of high-stakes medical data—missingness, continuous dosing, and small sample sizes—without relying on predefined rewards. Through extensive synthetic experiments and a real ICU seizure cohort, the method demonstrates robust performance, improved outcomes, and actionable dosing insights, particularly for high-risk subgroups. The clinical application suggests substantial potential to personalize ASM strategies and motivates future trials to validate heterogeneous causal effects in critical care.

Abstract

Recent statistical and reinforcement learning methods have significantly advanced patient care strategies. However, these approaches face substantial challenges in high-stakes contexts, including missing data, inherent stochasticity, and the critical requirements for interpretability and patient safety. Our work operationalizes a safe and interpretable framework to identify optimal treatment regimes. This approach involves matching patients with similar medical and pharmacological characteristics, allowing us to construct an optimal policy via interpolation. We perform a comprehensive simulation study to demonstrate the framework's ability to identify optimal policies even in complex settings. Ultimately, we operationalize our approach to study regimes for treating seizures in critically ill patients. Our findings strongly support personalized treatment strategies based on a patient's medical history and pharmacological features. Notably, we identify that reducing medication doses for patients with mild and brief seizure episodes while adopting aggressive treatment for patients in intensive care unit experiencing intense seizures leads to more favorable outcomes.

Safe and Interpretable Estimation of Optimal Treatment Regimes

TL;DR

This work develops a safe, interpretable framework to estimate patient-specific optimal treatment regimes for seizures in critically ill patients by integrating mechanistic PK/PD modeling, distance-metric learning, and interpolation across matched patients. It addresses challenges typical of high-stakes medical data—missingness, continuous dosing, and small sample sizes—without relying on predefined rewards. Through extensive synthetic experiments and a real ICU seizure cohort, the method demonstrates robust performance, improved outcomes, and actionable dosing insights, particularly for high-risk subgroups. The clinical application suggests substantial potential to personalize ASM strategies and motivates future trials to validate heterogeneous causal effects in critical care.

Abstract

Recent statistical and reinforcement learning methods have significantly advanced patient care strategies. However, these approaches face substantial challenges in high-stakes contexts, including missing data, inherent stochasticity, and the critical requirements for interpretability and patient safety. Our work operationalizes a safe and interpretable framework to identify optimal treatment regimes. This approach involves matching patients with similar medical and pharmacological characteristics, allowing us to construct an optimal policy via interpolation. We perform a comprehensive simulation study to demonstrate the framework's ability to identify optimal policies even in complex settings. Ultimately, we operationalize our approach to study regimes for treating seizures in critically ill patients. Our findings strongly support personalized treatment strategies based on a patient's medical history and pharmacological features. Notably, we identify that reducing medication doses for patients with mild and brief seizure episodes while adopting aggressive treatment for patients in intensive care unit experiencing intense seizures leads to more favorable outcomes.
Paper Structure (41 sections, 2 theorems, 16 equations, 14 figures, 4 tables)

This paper contains 41 sections, 2 theorems, 16 equations, 14 figures, 4 tables.

Key Result

Proposition 1

Given the conditional ignorability, local positivity, and smoothness of outcomes assumptions, $\widehat{\pi}^*_i$ is a consistent estimate of $\pi^*_i$, such that

Figures (14)

  • Figure 1: Percent of patients with poor outcomes under each method's proposed policy (lower is better). Boxplots show the distribution of the average outcomes over 20 iterations. Observed shows average observed outcomes. Inaction and Max Dosing administer no drugs and the max amount of drugs to each patient at each timestep, respectively. RF Q-learning is a finite timestep backward induction method using random forests. Infinite (Inf) Horizon methods use fitted Q-iteration clifton2020q with either linear models or random forests. Q-learning and Inf Horizon discretize the treatment into five bins. BCQ, CQL, CRR, GGPQ, SAC, and TD3 are Deep RL methods. Inf Horizon and Deep RL methods use an insightful reward function, see Appendix \ref{['sec: appendix_comp_method']}.
  • Figure 2: (a) Estimated density of the outcome probabilities under optimal and clinician's administered policies. (b) Tree characterizing the subpopulations that would have benefited the most by switching to the optimal policy. The value at each node in the tree shows the percentage point improvement in the outcome. Here, HEI/ABI refers to hypoxic-ischemic encephalopathy (HIE) and anoxic brain injury (ABI).
  • Figure 3: Difference in the propofol drug doses between the optimal and the administered regimes for mild and severe EA burden in last 1h for (a) patients on various levels of Glasgow coma scale (GCS); (b) patients with various levels of APACHE II scores; and (c) patients with various levels of ED50 for propofol, an important pharmacodynamic parameter determining the amount of drug required to reduce EA burden by 50%.
  • Figure 4: Difference in the levetiracetam doses between the optimal and the administered regimes for (a) patients with and without dementia experiencing a sustained EA burden for 12 hours; and (b) patients with and without subarachnoid hemorrhage experiencing a sustained EA burden for 6 hours.
  • Figure 5: Percent of patients with poor outcomes under different proposed policies (lower is better). Boxplots show the distribution of the average outcomes over 20 iterations. Observed shows average observed outcomes. Expert shows outcomes under the expert policies. Linear and DTree Q-learning are finite-timestep backward induction Q-learning using either linear models or decision trees. Linear and DTree OptClass are optimal classifier using either linear models or decision trees. See Appendix \ref{['sec: appendix_comp_method']} for further details of each method. Note that not all backward induction methods converged for all 20 iterations of each setup. See all_sims_nan.csv and Appendix \ref{['sec: appendix-full-synth-results-files']} for details.
  • ...and 9 more figures

Theorems & Definitions (8)

  • Remark 1
  • Remark 2
  • Proposition 1
  • Remark 3
  • Remark 4
  • Proposition 1: Consistency of Treatment Regime Estimator
  • Remark 5
  • Remark 6