Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records

Junghwan Lee; Simin Ma; Nicoleta Serban; Shihao Yang

Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records

Junghwan Lee, Simin Ma, Nicoleta Serban, Shihao Yang

TL;DR

This paper tackles time-dependent confounding in longitudinal claims data and proposes estimating propensity scores with deep sequence models to enable unbiased treatment effect estimation via inverse probability of treatment weighting (IPTW). By deploying LSTM and BERT-based architectures to directly predict the propensity score from sequences of claims codes, the authors avoid hand-crafted feature processing and demonstrate improved accuracy in estimating the average treatment effect (ATE) on synthetic and semi-synthetic datasets. Key contributions include empirical evidence that deep sequence models outperform traditional baselines for propensity score MAE and ATE MAE, and an attention-based interpretability analysis showing confounding variables receive higher attention in BERT models. The work suggests a practical, scalable approach for treatment effect estimation in EHRs/claims data, with potential impact on observational study validity and causal inference in healthcare.

Abstract

Observational data have been actively used to estimate treatment effect, driven by the growing availability of electronic health records (EHRs). However, EHRs typically consist of longitudinal records, often introducing time-dependent confoundings that hinder the unbiased estimation of treatment effect. Inverse probability of treatment weighting (IPTW) is a widely used propensity score method since it provides unbiased treatment effect estimation and its derivation is straightforward. In this study, we aim to utilize IPTW to estimate treatment effect in the presence of time-dependent confounding using claims records. Previous studies have utilized propensity score methods with features derived from claims records through feature processing, which generally requires domain knowledge and additional resources to extract information to accurately estimate propensity scores. Deep sequence models, particularly recurrent neural networks and self-attention-based architectures, have demonstrated good performance in modeling EHRs for various downstream tasks. We propose that these deep sequence models can provide accurate IPTW estimation of treatment effect by directly estimating the propensity scores from claims records without the need for feature processing. We empirically demonstrate this by conducting comprehensive evaluations using synthetic and semi-synthetic datasets.

Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records

TL;DR

Abstract

Paper Structure (28 sections, 20 equations, 4 figures, 5 tables)

This paper contains 28 sections, 20 equations, 4 figures, 5 tables.

Introduction
Methods
Preliminaries
Potential Outcome framework and average treatment effect
Propensity score.
Average treatment effect estimation using inverse probability of treatment weighting.
Problem setup
Recurrent Neural Networks
Stacked Transformer Encoder Layers
Experiments
Synthetic Dataset
Semi-synthetic Dataset
Experiment Setup
Baseline methods
Evaluation metrics
...and 13 more sections

Figures (4)

Figure 1: (a) Causal diagram of our problem setup. $A$ denotes binary treatment and $Y$ denotes continuous outcome. A claims record $\mathbf{x}_t$ includes medical codes and also can include a treatment assignment $A$. (b) Causal diagram of a hypothetical confounding scenario in our experiment using a semi-synthetic dataset. The confounding depends on the record-wise distance between chronic sinusitis and viral sinusitis. Arrows between records are omitted for readability.
Figure 2: (a) LSTM to estimate propensity score using claims records. Average pooling is applied to the code representations to generate record representation, aggregating the representations of the codes present in the record. For example, if $\text{Record}_t$ contains Fever, Acetaminophen, and Cough codes, the record representation of $\text{Record}_t$ is generated by averaging the representations of these three codes. (b) $\text{BERT}_{\rm code}$ to estimate propensity score using claims records. The input representations are constructed by arranging code representations in chronological order. (c) $\text{BERT}_{\rm record}$ to estimate propensity score using claims records. The input representations are constructed using record representations, similar to LSTM.
Figure 3: Visualization of attention weights at the last encoder layer of $\text{BERT}_{\rm code}$ for selected samples from the test set. C indicates the position of confounding variables. [CLS] indicates the position of [CLS] token. Darker color represents higher attention weight.
Figure A.1: Distributions of propensity scores associated with the synthetic dataset under the three confounding scenarios and the semi-synthetic dataset. The gray bars indicate the treated samples and the unshaded bars indicate the untreated samples.

Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records

TL;DR

Abstract

Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records

Authors

TL;DR

Abstract

Table of Contents

Figures (4)