A Design-based Solution for Causal Inference with Text: Can a Language Model Be Too Large?
Graham Tierney, Srikar Katta, Christopher Bail, Sunshine Hillygus, Alexander Volfovsky
TL;DR
This paper addresses the challenge of causal inference for linguistic properties, focusing on latent confounding and overlap when text features encode the treatment $T$. It introduces a design-based experimental approach that uses study participants to generate texts and editors to flip $T$ while fixing other content, enabling unbiased estimation of $\tau_t$ despite latent confounding. Through simulations and a large real-data experiment, it shows that language-model–based estimators (TextCause, TI) can cause positivity violations and bias, while simpler bag-of-words approaches often perform more robustly; the authors provide a practical, auditable benchmark for evaluating text-as-treatment estimators. Substantively, expressing intellectual humility in political arguments reduces perceived aggressiveness but also lowers informativeness and persuasiveness, revealing a nuanced trade-off for using humility to combat polarization. The work offers methodological advances with broad implications for social platforms, policymakers, and causal-text research, by delivering an auditable design and ground-truth validation framework that can guide future methodological developments.
Abstract
Many social science questions ask how linguistic properties causally affect an audience's attitudes and behaviors. Because text properties are often interlinked (e.g., angry reviews use profane language), we must control for possible latent confounding to isolate causal effects. Recent literature proposes adapting large language models (LLMs) to learn latent representations of text that successfully predict both treatment and the outcome. However, because the treatment is a component of the text, these deep learning methods risk learning representations that actually encode the treatment itself, inducing overlap bias. Rather than depending on post-hoc adjustments, we introduce a new experimental design that handles latent confounding, avoids the overlap issue, and unbiasedly estimates treatment effects. We apply this design in an experiment evaluating the persuasiveness of expressing humility in political communication. Methodologically, we demonstrate that LLM-based methods perform worse than even simple bag-of-words models using our real text and outcomes from our experiment. Substantively, we isolate the causal effect of expressing humility on the perceived persuasiveness of political statements, offering new insights on communication effects for social media platforms, policy makers, and social scientists.
