Table of Contents
Fetching ...

Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning

Jannik Deuschel, Caleb N. Ellington, Yingtao Luo, Benjamin J. Lengerich, Pascal Friederich, Eric P. Xing

TL;DR

CPR tackles the interpretability-accuracy tradeoff in policy learning by encoding rich contextual history into context-specific, linear observation-to-action mappings generated on demand. By combining a black-box context encoder with portable, glass-box policy predictors, CPR achieves state-of-the-art performance on medical tasks such as antibiotic prescription in ICUs and MRI ordering in Alzheimer's patients, while exposing the context-driven structure of clinical decisions. The framework supports both local (piecewise, context-specific) and global (telescoping, globally interpretable) interpretability, and is validated through real datasets and simulated MDPs to reveal heterogeneity and outliers. This approach enables high-resolution, context-aware analyses of medical decision processes with practical implications for auditing, personalization, and clinical guidelines.

Abstract

Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models force a tradeoff between accuracy and interpretability, limiting data-driven interpretations of human decision-making processes. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically under different contexts. Thus, we develop Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem, where each context poses a unique task and complex decision policies can be constructed piece-wise from many simple context-specific policies. CPR models each context-specific policy as a linear map, and generates new policy models $\textit{on-demand}$ as contexts are updated with new observations. We provide two flavors of the CPR framework: one focusing on exact local interpretability, and one retaining full global interpretability. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement, CPR closes the accuracy gap between interpretable and black-box methods, allowing high-resolution exploration and analysis of context-specific decision models.

Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning

TL;DR

CPR tackles the interpretability-accuracy tradeoff in policy learning by encoding rich contextual history into context-specific, linear observation-to-action mappings generated on demand. By combining a black-box context encoder with portable, glass-box policy predictors, CPR achieves state-of-the-art performance on medical tasks such as antibiotic prescription in ICUs and MRI ordering in Alzheimer's patients, while exposing the context-driven structure of clinical decisions. The framework supports both local (piecewise, context-specific) and global (telescoping, globally interpretable) interpretability, and is validated through real datasets and simulated MDPs to reveal heterogeneity and outliers. This approach enables high-resolution, context-aware analyses of medical decision processes with practical implications for auditing, personalization, and clinical guidelines.

Abstract

Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models force a tradeoff between accuracy and interpretability, limiting data-driven interpretations of human decision-making processes. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically under different contexts. Thus, we develop Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem, where each context poses a unique task and complex decision policies can be constructed piece-wise from many simple context-specific policies. CPR models each context-specific policy as a linear map, and generates new policy models as contexts are updated with new observations. We provide two flavors of the CPR framework: one focusing on exact local interpretability, and one retaining full global interpretability. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on predicting antibiotic prescription in intensive care units ( AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ( AUROC vs. previous SOTA). With this improvement, CPR closes the accuracy gap between interpretable and black-box methods, allowing high-resolution exploration and analysis of context-specific decision models.
Paper Structure (26 sections, 8 equations, 14 figures, 6 tables)

This paper contains 26 sections, 8 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Exploration of contextualized policies generated by CPR for predicting antibiotic prescription. (a) Contextualized policies identify prior antibiotic prescription and (b) treatment time as drivers of treatment heterogeneity. (c) CPR generates policies that evolve with time and treatment history, revealing the context-specific importance of patient symptoms toward future treatments.
  • Figure 2: CPR generates decision models for marginal groups with high accuracy. Left: Using only a small subgroup of patients making up 7 observations in the training set, CPR identifies elevated creatinine as a severe risk factor for kidney failure and reassigns patients to a non-antibiotics treatment plan, while these patients would otherwise be likely to receive treatment. Right: For the small subgroup of patients below 20 years of age (with only 9 observations in the held-out set and 44/12 in the train/validation set), CPR improves drastically in terms of cross-entropy loss.
  • Figure 3: Comparison of patient-specific model parameter distributions by age and gender in visit $t=0$ after incorporating static contexts. Static contexts help to personalize initial models when no history is available.
  • Figure 4: Comparing policy models learned by CPR and RNN in terms of the Pearson's correlation between estimated and true action probabilities and context-specific policy coefficients. We choose default simulation arguments $N=200$, $\sigma_a=0$, $T=15$, $\tau=4$, and $\sigma_{\theta}=0$, varying each parameter individually. We hold out 15% of trajectories at random for evaluation. Results are the mean and 95% confidence interval from three randomly initialized and independently simulated data sets.
  • Figure 5: CPR uses dynamic treatment contexts to generate the agent's decision model at each timestep. Decision models are a context-specific weighted combination of observed features. With these context-specific linear decision models, CPR achieves exact model-based interpretability without sacrificing representational capacity.
  • ...and 9 more figures