Evaluating Interventional Reasoning Capabilities of Large Language Models
Tejas Kasetty, Divyat Mahajan, Gintare Karolina Dziugaite, Alexandre Drouin, Dhanya Sridhar
TL;DR
This work investigates how large language models update their beliefs about causal relationships after interventions. It introduces intervention effects (IE) as a zero-shot, prompt-based binary task across three canonical DAGs (bivariate, confounding, mediation) to isolate causal-reasoning capabilities from memorization or surface-text shortcuts. Through three benchmarks (Random Char, Tübingen, Random Tübingen) and six models, the study finds GPT-4 family models, especially GPT-4-turbo, demonstrate strong interventional reasoning, while LLaMA variants lag and memorization plays a limited role. The findings advance understanding of abstract causal reasoning in LLMs and point to future work on broader graphs, causal-identification tasks, and methods to enhance non-GPT models.
Abstract
Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consider using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how LLMs reason about interventions. Motivated by the role that interventions play in causal inference, in this paper, we conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts. We evaluate six LLMs on the benchmarks, finding that GPT models show promising accuracy at predicting the intervention effects.
