Table of Contents
Fetching ...

LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding

Yuchen Ma, Dennis Frauen, Jonas Schweisthal, Stefan Feuerriegel

TL;DR

The paper tackles inference-time text confounding in CATE estimation, where training-time confounders are fully observed but at inference time only textual descriptions are available. It introduces TCA, a three-stage framework that uses an LLM to generate a text-based surrogate confounder from structured predictors, learns nuisance functions on true confounders, and then performs a doubly robust, text-conditioned regression to estimate the text-based CATE, $\tau^t(t)$. A key theoretical result is the identifiability $\tau^t(t)=\mathbb{E}[\tau^x(X)\mid T=t]$, enabling unbiased estimation by conditioning on generated text while leveraging ground-truth confounders during training. Empirically, TCA consistently outperforms naive text-based baselines on IST and MIMIC-III in terms of PEHE, demonstrating robustness to prompt strategies and different LLMs, with practical implications for personalized medicine in telemedicine settings.

Abstract

Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time, predictions are often made using textual descriptions (e.g., descriptions with self-reported symptoms), which are incomplete representations of the original patient information. In this work, we make three contributions. (1) We show that the discrepancy between the data available during training time and inference time can lead to biased estimates of treatment effects. We formalize this issue as an inference time text confounding problem, where confounders are fully observed during training time but only partially available through text at inference time. (2) To address this problem, we propose a novel framework for estimating treatment effects that explicitly accounts for inference time text confounding. Our framework leverages large language models together with a custom doubly robust learner to mitigate biases caused by the inference time text confounding. (3) Through a series of experiments, we demonstrate the effectiveness of our framework in real-world applications.

LLM-Driven Treatment Effect Estimation Under Inference Time Text Confounding

TL;DR

The paper tackles inference-time text confounding in CATE estimation, where training-time confounders are fully observed but at inference time only textual descriptions are available. It introduces TCA, a three-stage framework that uses an LLM to generate a text-based surrogate confounder from structured predictors, learns nuisance functions on true confounders, and then performs a doubly robust, text-conditioned regression to estimate the text-based CATE, . A key theoretical result is the identifiability , enabling unbiased estimation by conditioning on generated text while leveraging ground-truth confounders during training. Empirically, TCA consistently outperforms naive text-based baselines on IST and MIMIC-III in terms of PEHE, demonstrating robustness to prompt strategies and different LLMs, with practical implications for personalized medicine in telemedicine settings.

Abstract

Estimating treatment effects is crucial for personalized decision-making in medicine, but this task faces unique challenges in clinical practice. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time, predictions are often made using textual descriptions (e.g., descriptions with self-reported symptoms), which are incomplete representations of the original patient information. In this work, we make three contributions. (1) We show that the discrepancy between the data available during training time and inference time can lead to biased estimates of treatment effects. We formalize this issue as an inference time text confounding problem, where confounders are fully observed during training time but only partially available through text at inference time. (2) To address this problem, we propose a novel framework for estimating treatment effects that explicitly accounts for inference time text confounding. Our framework leverages large language models together with a custom doubly robust learner to mitigate biases caused by the inference time text confounding. (3) Through a series of experiments, we demonstrate the effectiveness of our framework in real-world applications.

Paper Structure

This paper contains 30 sections, 6 theorems, 39 equations, 4 figures, 5 tables, 1 algorithm.

Key Result

Lemma 4.1

For any $t \in \mathcal{T}$, under Assumption ass:basic, the naïve baseline estimating $\tau^t_{\mathrm{naive}}(t)$ has pointwise confounding bias with respect to the true CATE $\tau^t(t)$, given by

Figures (4)

  • Figure 1: Discrepancy between training time and inference time. At training time, models for estimating treatment effects are typically trained on well-structured medical datasets that contain detailed patient information. However, at inference time (e.g., in telemedicine, remote healthcare consultations, or medical chatbots), predictions are often made using textual descriptions with self-reported symptoms. We formalize this discrepancy as inference time text confounding, where confounders are fully observed during training time but only partially available through text at inference time.
  • Figure 2: Causal graph for inference time text confounding. (a) At training time, we have access to the true confounders $X$ but the induced test confounders $T$ are unobserved. (b) At inference time, the induced text confounders $T$ are observed, while the true confounders $X$ are unavailable.
  • Figure 3: Performance of CATE estimation under varying confounder strengths and prompt strategies across datasets.
  • Figure : TCA for CATE estimation with inference time text confounding.

Theorems & Definitions (13)

  • Lemma 4.1: Pointwise confounding bias of the naïve baseline
  • proof
  • Remark 4.2: Non-zero bias of the naïve estimator
  • Lemma 4.3: Identifiablity of $\tau^t(t)$
  • proof
  • Corollary 4.4: Double robustness property of the estimator
  • Lemma B.1: Pointwise confounding bias of the naïve baseline
  • proof
  • Remark B.2: Non-zero bias of the naïve estimator
  • Lemma B.3: Identifiablity of $\tau^t(t)$
  • ...and 3 more