Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning
Jannik Deuschel, Caleb N. Ellington, Yingtao Luo, Benjamin J. Lengerich, Pascal Friederich, Eric P. Xing
TL;DR
CPR tackles the interpretability-accuracy tradeoff in policy learning by encoding rich contextual history into context-specific, linear observation-to-action mappings generated on demand. By combining a black-box context encoder with portable, glass-box policy predictors, CPR achieves state-of-the-art performance on medical tasks such as antibiotic prescription in ICUs and MRI ordering in Alzheimer's patients, while exposing the context-driven structure of clinical decisions. The framework supports both local (piecewise, context-specific) and global (telescoping, globally interpretable) interpretability, and is validated through real datasets and simulated MDPs to reveal heterogeneity and outliers. This approach enables high-resolution, context-aware analyses of medical decision processes with practical implications for auditing, personalization, and clinical guidelines.
Abstract
Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models force a tradeoff between accuracy and interpretability, limiting data-driven interpretations of human decision-making processes. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically under different contexts. Thus, we develop Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem, where each context poses a unique task and complex decision policies can be constructed piece-wise from many simple context-specific policies. CPR models each context-specific policy as a linear map, and generates new policy models $\textit{on-demand}$ as contexts are updated with new observations. We provide two flavors of the CPR framework: one focusing on exact local interpretability, and one retaining full global interpretability. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement, CPR closes the accuracy gap between interpretable and black-box methods, allowing high-resolution exploration and analysis of context-specific decision models.
