Table of Contents
Fetching ...

Learning Treatment Policies From Multimodal Electronic Health Records

Henri Arno, Thomas Demeester

TL;DR

This work proposes an extension of causal policy learning that uses expert-provided annotations during training to supervise treatment effect estimation, while using only multimodal representations as input during inference, and achieves strong empirical performance across synthetic, semi-synthetic, and real-world EHR datasets.

Abstract

We study how to learn effective treatment policies from multimodal electronic health records (EHRs) that consist of tabular data and clinical text. These policies can help physicians make better treatment decisions and allocate healthcare resources more efficiently. Causal policy learning methods prioritize patients with the largest expected treatment benefit. Yet, existing estimators assume tabular covariates that satisfy strong causal assumptions, which are typically violated in the multimodal setting. As a result, predictive models of baseline risk are commonly used in practice to guide such decisions, as they extend naturally to multimodal data. However, such risk-based policies are not designed to identify which patients benefit most from treatment. We propose an extension of causal policy learning that uses expert-provided annotations during training to supervise treatment effect estimation, while using only multimodal representations as input during inference. We show that the proposed method achieves strong empirical performance across synthetic, semi-synthetic, and real-world EHR datasets, thereby offering practical insights into applying causal machine learning to realistic clinical data.

Learning Treatment Policies From Multimodal Electronic Health Records

TL;DR

This work proposes an extension of causal policy learning that uses expert-provided annotations during training to supervise treatment effect estimation, while using only multimodal representations as input during inference, and achieves strong empirical performance across synthetic, semi-synthetic, and real-world EHR datasets.

Abstract

We study how to learn effective treatment policies from multimodal electronic health records (EHRs) that consist of tabular data and clinical text. These policies can help physicians make better treatment decisions and allocate healthcare resources more efficiently. Causal policy learning methods prioritize patients with the largest expected treatment benefit. Yet, existing estimators assume tabular covariates that satisfy strong causal assumptions, which are typically violated in the multimodal setting. As a result, predictive models of baseline risk are commonly used in practice to guide such decisions, as they extend naturally to multimodal data. However, such risk-based policies are not designed to identify which patients benefit most from treatment. We propose an extension of causal policy learning that uses expert-provided annotations during training to supervise treatment effect estimation, while using only multimodal representations as input during inference. We show that the proposed method achieves strong empirical performance across synthetic, semi-synthetic, and real-world EHR datasets, thereby offering practical insights into applying causal machine learning to realistic clinical data.

Paper Structure

This paper contains 73 sections, 13 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Overview of the problem setting. Left: Electronic health records combine tabular data and clinical text: key confounders (e.g., cough, fever) may appear only in text. Middle: Multimodal confounders influence both treatment $T$ and outcome $Y$. Right: Heterogeneity in the conditional average treatment effect $\tau^x(x)=\mathbb{E}[Y(1)-Y(0)\mid X=x]$ across patient characteristics.
  • Figure 2: Overview of the proposed method. Top: Multimodal electronic health records are (1) annotated by experts to identify confounders and (2) encoded into joint representations using a pre-trained model. Bottom: The causal framework proceeds in three stages: (1) nuisance estimation and pseudo-outcome construction from annotated data, (2) effect estimation from multimodal representations, and (3) inference-time policy application based on the estimated coarsened effects.
  • Figure 3: Preservation of treatment effect ordering. When the coarsened effects differ from the true effects, because the representations lose some confounding information, the ordering of individuals in the population, and the implied treatment policy, could nevertheless remain unchanged. This is the case, as long as the coarsening bias ($\delta_i$) is small relative to the gaps between the true treatment effects ($\gamma_i$).
  • Figure 4: Precision in estimation of heterogeneous effects (PEHE) on the SynSum dataset as a function of training set size (mean $\pm$ std over 5 random seeds). The second-stage treatment effect models were trained and evaluated with the true confounders, the multimodal representations (text + tabular) and the tabular variables only as inputs.
  • Figure 5: Relationship between the true baseline risk $\mu_0(x)$ and the true treatment effect $\tau^x(x)$ across synthetic datasets. In SynSum (left), the relationship is negative: patients with higher baseline risk tend to experience greater treatment benefit (more negative effects). In contrast, in MIMIC-Syn (right), the relationship is positive: treatment benefit attenuates as baseline risk increases. The oscillations visible in the right subplot arise from the sinusoidal age term in $\tau^x(x)$.
  • ...and 3 more figures