LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding
Yuchen Ma, Dennis Frauen, Jonas Schweisthal, Stefan Feuerriegel
TL;DR
The paper tackles inference-time text confounding in CATE estimation, where training-time confounders are fully observed but at inference time only textual descriptions are available. It introduces TCA, a three-stage framework that uses an LLM to generate a text-based surrogate confounder from structured predictors, learns nuisance functions on true confounders, and then performs a doubly robust, text-conditioned regression to estimate the text-based CATE, $\tau^t(t)$. A key theoretical result is the identifiability $\tau^t(t)=\mathbb{E}[\tau^x(X)\mid T=t]$, enabling unbiased estimation by conditioning on generated text while leveraging ground-truth confounders during training. Empirically, TCA consistently outperforms naive text-based baselines on IST and MIMIC-III in terms of PEHE, demonstrating robustness to prompt strategies and different LLMs, with practical implications for personalized medicine in telemedicine settings.
Abstract
Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time, predictions are often made using textual descriptions (e.g., descriptions with self-reported symptoms), which are incomplete representations of the original patient information. In this work, we make three contributions. (1) We show that the discrepancy between the data available during training time and inference time can lead to biased estimates of treatment effects. We formalize this issue as an inference time text confounding problem, where confounders are fully observed during training time but only partially available through text at inference time. (2) To address this problem, we propose a novel framework for estimating treatment effects that explicitly accounts for inference time text confounding. Our framework leverages large language models together with a custom doubly robust learner to mitigate biases caused by the inference time text confounding. (3) Through a series of experiments, we demonstrate the effectiveness of our framework in real-world applications.
