Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records
Junghwan Lee, Simin Ma, Nicoleta Serban, Shihao Yang
TL;DR
This paper tackles time-dependent confounding in longitudinal claims data and proposes estimating propensity scores with deep sequence models to enable unbiased treatment effect estimation via inverse probability of treatment weighting (IPTW). By deploying LSTM and BERT-based architectures to directly predict the propensity score from sequences of claims codes, the authors avoid hand-crafted feature processing and demonstrate improved accuracy in estimating the average treatment effect (ATE) on synthetic and semi-synthetic datasets. Key contributions include empirical evidence that deep sequence models outperform traditional baselines for propensity score MAE and ATE MAE, and an attention-based interpretability analysis showing confounding variables receive higher attention in BERT models. The work suggests a practical, scalable approach for treatment effect estimation in EHRs/claims data, with potential impact on observational study validity and causal inference in healthcare.
Abstract
Observational data have been actively used to estimate treatment effect, driven by the growing availability of electronic health records (EHRs). However, EHRs typically consist of longitudinal records, often introducing time-dependent confoundings that hinder the unbiased estimation of treatment effect. Inverse probability of treatment weighting (IPTW) is a widely used propensity score method since it provides unbiased treatment effect estimation and its derivation is straightforward. In this study, we aim to utilize IPTW to estimate treatment effect in the presence of time-dependent confounding using claims records. Previous studies have utilized propensity score methods with features derived from claims records through feature processing, which generally requires domain knowledge and additional resources to extract information to accurately estimate propensity scores. Deep sequence models, particularly recurrent neural networks and self-attention-based architectures, have demonstrated good performance in modeling EHRs for various downstream tasks. We propose that these deep sequence models can provide accurate IPTW estimation of treatment effect by directly estimating the propensity scores from claims records without the need for feature processing. We empirically demonstrate this by conducting comprehensive evaluations using synthetic and semi-synthetic datasets.
