Deep Causal Behavioral Policy Learning: Applications to Healthcare
Jonas Knecht, Anna Zink, Jonathan Kolstad, Maya Petersen
TL;DR
This work tackles the challenge of learning high-dimensional, dynamic provider behavior and its causal impact on patient outcomes in healthcare. It combines a structural causal model with semiparametric estimation and transformer-based learning to identify an optimal provider and to emulate the policy of top providers through a large clinical behavioral model. The authors propose DC-BPL, a two-stage framework that first identifies the optimal provider and then learns provider-specific behavioral policies via transformer architectures, enabling scalable decision support and provider coaching. Through a proof-of-concept using UCSF emergency department data, they demonstrate that transformers can learn the action mechanism and that learned separation metrics correlate with predictive accuracy, supporting safe deployment in clinical settings. The work lays groundwork for causal quality measurement, coaching, and decision support that leverage tacit clinical knowledge embedded in provider actions, with potential extensions to multimodal data and reasoning-model alignment.
Abstract
We present a deep learning-based approach to studying dynamic clinical behavioral regimes in diverse non-randomized healthcare settings. Our proposed methodology - deep causal behavioral policy learning (DC-BPL) - uses deep learning algorithms to learn the distribution of high-dimensional clinical action paths, and identifies the causal link between these action paths and patient outcomes. Specifically, our approach: (1) identifies the causal effects of provider assignment on clinical outcomes; (2) learns the distribution of clinical actions a given provider would take given evolving patient information; (3) and combines these steps to identify the optimal provider for a given patient type and emulate that provider's care decisions. Underlying this strategy, we train a large clinical behavioral model (LCBM) on electronic health records data using a transformer architecture, and demonstrate its ability to estimate clinical behavioral policies. We propose a novel interpretation of a behavioral policy learned using the LCBM: that it is an efficient encoding of complex, often implicit, knowledge used to treat a patient. This allows us to learn a space of policies that are critical to a wide range of healthcare applications, in which the vast majority of clinical knowledge is acquired tacitly through years of practice and only a tiny fraction of information relevant to patient care is written down (e.g. in textbooks, studies or standardized guidelines).
