Table of Contents
Fetching ...

Using LLMs to Directly Guess Conditional Expectations Can Improve Efficiency in Causal Estimation

Chris Engh, P. M. Aronow

TL;DR

This work tackles causal estimation under high-dimensional, multimodal confounding by augmenting nuisance-function estimates with LLM-generated predictions of $E[Y|W]$ and $E[D|W]$. Within a case study on online jewelry auctions, the authors show that incorporating LLm guesses alongside standard embeddings in a residual-on-residual DML framework improves the accuracy of $E[Y|W]$ predictions and reduces estimation uncertainty for the causal parameter $\theta$, while the effect on $E[D|W]$ is modest. The key insight is that LLMs’ stored knowledge and reasoning provide informative priors that help unravel nonlinear relationships when data are limited, offering a practical path to more efficient causal inference with multimodal confounders. The approach is extensible to additional synthetic predictors and other estimators, potentially broadening the impact of LLM-assisted causal analysis in econometrics and related fields.

Abstract

We propose a simple yet effective use of LLM-powered AI tools to improve causal estimation. In double machine learning, the accuracy of causal estimates of the effect of a treatment on an outcome in the presence of a high-dimensional confounder depends on the performance of estimators of conditional expectation functions. We show that predictions made by generative models trained on historical data can be used to improve the performance of these estimators relative to approaches that solely rely on adjusting for embeddings extracted from these models. We argue that the historical knowledge and reasoning capacities associated with these generative models can help overcome curse-of-dimensionality problems in causal inference problems. We consider a case study using a small dataset of online jewelry auctions, and demonstrate that inclusion of LLM-generated guesses as predictors can improve efficiency in estimation.

Using LLMs to Directly Guess Conditional Expectations Can Improve Efficiency in Causal Estimation

TL;DR

This work tackles causal estimation under high-dimensional, multimodal confounding by augmenting nuisance-function estimates with LLM-generated predictions of and . Within a case study on online jewelry auctions, the authors show that incorporating LLm guesses alongside standard embeddings in a residual-on-residual DML framework improves the accuracy of predictions and reduces estimation uncertainty for the causal parameter , while the effect on is modest. The key insight is that LLMs’ stored knowledge and reasoning provide informative priors that help unravel nonlinear relationships when data are limited, offering a practical path to more efficient causal inference with multimodal confounders. The approach is extensible to additional synthetic predictors and other estimators, potentially broadening the impact of LLM-assisted causal analysis in econometrics and related fields.

Abstract

We propose a simple yet effective use of LLM-powered AI tools to improve causal estimation. In double machine learning, the accuracy of causal estimates of the effect of a treatment on an outcome in the presence of a high-dimensional confounder depends on the performance of estimators of conditional expectation functions. We show that predictions made by generative models trained on historical data can be used to improve the performance of these estimators relative to approaches that solely rely on adjusting for embeddings extracted from these models. We argue that the historical knowledge and reasoning capacities associated with these generative models can help overcome curse-of-dimensionality problems in causal inference problems. We consider a case study using a small dataset of online jewelry auctions, and demonstrate that inclusion of LLM-generated guesses as predictors can improve efficiency in estimation.

Paper Structure

This paper contains 17 sections, 8 equations, 2 tables.