LLMs Are Prone to Fallacies in Causal Inference
Nitish Joshi, Abulhair Saparov, Yixin Wang, He He
TL;DR
This work interrogates whether large language models (LLMs) can infer true causal relations beyond explicitly memorized facts by finetuning on synthetic data that encodes temporal, spatial, and counterfactual relations. It reveals a strong position-based heuristic: models largely rely on the order of event mentions, leading to a post hoc fallacy when this bias is mitigated. When the position bias is removed, LLMs can correctly infer temporal and spatial relations but still struggle with counterfactual-based causality, and scaling alone does not fix these limits. Finetuning with explicit causal statements can mitigate the post hoc fallacy, suggesting that current LLMs may not reliably acquire novel causal knowledge without deliberate training interventions. Overall, the study highlights the importance of controlled synthetic data to disentangle memorization from inference in causal reasoning and points to targeted mitigation strategies for robust causal understanding in language models.
Abstract
Recent work shows that causal facts can be effectively extracted from LLMs through prompting, facilitating the creation of causal graphs for causal inference tasks. However, it is unclear if this success is limited to explicitly-mentioned causal facts in the pretraining data which the model can memorize. Thus, this work investigates: Can LLMs infer causal relations from other relational data in text? To disentangle the role of memorized causal facts vs inferred causal relations, we finetune LLMs on synthetic data containing temporal, spatial and counterfactual relations, and measure whether the LLM can then infer causal relations. We find that: (a) LLMs are susceptible to inferring causal relations from the order of two entity mentions in text (e.g. X mentioned before Y implies X causes Y); (b) if the order is randomized, LLMs still suffer from the post hoc fallacy, i.e. X occurs before Y (temporal relation) implies X causes Y. We also find that while LLMs can correctly deduce the absence of causal relations from temporal and spatial relations, they have difficulty inferring causal relations from counterfactuals, questioning their understanding of causality.
