Table of Contents
Fetching ...

From Text to Treatment Effects: A Meta-Learning Approach to Handling Text-Based Confounding

Henri Arno, Paloma Rabaey, Thomas Demeester

TL;DR

It is shown that learners using pre-trained text representations of confounders, in addition to tabular background variables, achieve improved CATE estimates compared to those relying solely on the tabular variables, particularly when sufficient data is available.

Abstract

One of the central goals of causal machine learning is the accurate estimation of heterogeneous treatment effects from observational data. In recent years, meta-learning has emerged as a flexible, model-agnostic paradigm for estimating conditional average treatment effects (CATE) using any supervised model. This paper examines the performance of meta-learners when the confounding variables are expressed in text. Through synthetic data experiments, we show that learners using pre-trained text representations of confounders, in addition to tabular background variables, achieve improved CATE estimates compared to those relying solely on the tabular variables, particularly when sufficient data is available. However, due to the entangled nature of the text embeddings, these models do not fully match the performance of meta-learners with perfect confounder knowledge. These findings highlight both the potential and the limitations of pre-trained text representations for causal inference and open up interesting avenues for future research.

From Text to Treatment Effects: A Meta-Learning Approach to Handling Text-Based Confounding

TL;DR

It is shown that learners using pre-trained text representations of confounders, in addition to tabular background variables, achieve improved CATE estimates compared to those relying solely on the tabular variables, particularly when sufficient data is available.

Abstract

One of the central goals of causal machine learning is the accurate estimation of heterogeneous treatment effects from observational data. In recent years, meta-learning has emerged as a flexible, model-agnostic paradigm for estimating conditional average treatment effects (CATE) using any supervised model. This paper examines the performance of meta-learners when the confounding variables are expressed in text. Through synthetic data experiments, we show that learners using pre-trained text representations of confounders, in addition to tabular background variables, achieve improved CATE estimates compared to those relying solely on the tabular variables, particularly when sufficient data is available. However, due to the entangled nature of the text embeddings, these models do not fully match the performance of meta-learners with perfect confounder knowledge. These findings highlight both the potential and the limitations of pre-trained text representations for causal inference and open up interesting avenues for future research.
Paper Structure (22 sections, 6 equations, 4 figures)

This paper contains 22 sections, 6 equations, 4 figures.

Figures (4)

  • Figure 1: Overview of the experimental setup across three panels: Panel (a) presents the different types of representations $\Phi_{text}$ of the text-based confounders, concatenated with the tabular variables $\Phi_{tab}$ to form the model inputs. Panel (b) illustrates the architecture used to estimate the nuisance parameters, which are then transformed—along with the observed outcomes and treatment indicators—into the pseudo-outcomes for each learner (except the T-learner). Panel (c) depicts the second-stage regressor, which uses these pseudo-outcomes as targets to estimate the CATE.
  • Figure 2: Performance comparison of the meta-learners across four settings: the text-based confounders represented (1) with perfect knowledge of them, (2) as pre-trained BioLord embeddings, (3) as pre-trained MPNet embeddings and (4) with no knowledge of them. The figure shows the PEHE on the test set (lower values indicate better performance) for each learner (columns) across two training set sizes (rows). A dashed red line at PEHE = 1 is included to aid comparison across different y-axis scales. The boxplots display results over different random seeds to illustrate variability due to weight initialisation and data sampling.
  • Figure 3: An example entry from the SynSUM dataset that combines structured tabular variables, sampled from a Bayesian network, with a textual clinical note, generated by GPT-4o. This synthetic example represents a realistic patient record from a patient encounter in primary care containing both structured data (e.g., underlying conditions of the patient) and unstructured text (e.g. describing the results of a physical examination revealing the symptoms a patient experiences).
  • Figure 4: Performance comparison of the meta-learners across two settings: the text-based confounders represented (1) with perfect knowledge of them and (2) with no knowledge of them. The figure shows the PEHE on the test set (lower values indicate better performance) for each learner (columns) across four training set sizes (rows). A dashed red line at PEHE = 1 is included to aid comparison across different y-axis scales. The boxplots display results over different random seeds to illustrate variability due to weight initialisation and data sampling.