Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

Eleni Sgouritsa; Virginia Aglietti; Yee Whye Teh; Arnaud Doucet; Arthur Gretton; Silvia Chiappa

Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

Eleni Sgouritsa, Virginia Aglietti, Yee Whye Teh, Arnaud Doucet, Arthur Gretton, Silvia Chiappa

TL;DR

This work tackles the challenge of inferring causal structure from correlation statements using large language models. It introduces PC-SubQ, a prompting strategy that decomposes natural language causal discovery (NL-CD) into fixed steps of the PC algorithm, guided by sequential subquestions with minimal historical context. Across five LLMs and the Corr2Cause benchmark, PC-SubQ achieves higher F1 and accuracy than baseline prompting strategies and demonstrates robustness to variable renaming, paraphrasing, and natural-story inputs, while providing transparent reasoning traces. The approach achieves strong results without model fine-tuning and suggests a general, interpretable framework for algorithmic reasoning tasks in NLP-enabled causal inference.

Abstract

The reasoning abilities of Large Language Models (LLMs) are attracting increasing attention. In this work, we focus on causal reasoning and address the task of establishing causal relationships based on correlation information, a highly challenging problem on which several LLMs have shown poor performance. We introduce a prompting strategy for this problem that breaks the original task into fixed subquestions, with each subquestion corresponding to one step of a formal causal discovery algorithm, the PC algorithm. The proposed prompting strategy, PC-SubQ, guides the LLM to follow these algorithmic steps, by sequentially prompting it with one subquestion at a time, augmenting the next subquestion's prompt with the answer to the previous one(s). We evaluate our approach on an existing causal benchmark, Corr2Cause: our experiments indicate a performance improvement across five LLMs when comparing PC-SubQ to baseline prompting strategies. Results are robust to causal query perturbations, when modifying the variable names or paraphrasing the expressions.

Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

TL;DR

Abstract

Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)