Is Knowledge All Large Language Models Needed for Causal Reasoning?

Hengrui Cai; Shengjie Liu; Rui Song

Is Knowledge All Large Language Models Needed for Causal Reasoning?

Hengrui Cai, Shengjie Liu, Rui Song

TL;DR

This work investigates how LLMs perform causal reasoning by decomposing inputs into context, embedded knowledge, and explicit numerical data using a do-operator–based attribution framework. It demonstrates that knowledge plays the dominant role in LLM causal inference, while data offers limited, context-dependent support, and introduces a fine-tuning approach (LoRA on Mistral-7B-v0.2) for robust pairwise causal discovery that leverages both knowledge and numerical cues. The study employs nine diverse datasets and a suite of prompting ablations, reverse-discovery tests, and rigorous metrics (TDR, F1, SHD, FDR) to quantify attribution components (CAK, CAD, MAD, MAK). These findings highlight the practical potential of attribution-guided fine-tuning to enhance causal reasoning in LLMs, while acknowledging benchmarks and uncertainty considerations as directions for future work.

Abstract

This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence. Despite the proficiency of LLMs in a range of tasks, their potential for understanding causality requires further exploration. We propose a novel causal attribution model that utilizes ``do-operators" for constructing counterfactual scenarios, allowing us to systematically quantify the influence of input numerical data and LLMs' pre-existing knowledge on their causal reasoning processes. Our newly developed experimental setup assesses LLMs' reliance on contextual information and inherent knowledge across various domains. Our evaluation reveals that LLMs' causal reasoning ability mainly depends on the context and domain-specific knowledge provided. In the absence of such knowledge, LLMs can still maintain a degree of causal reasoning using the available numerical data, albeit with limitations in the calculations. This motivates the proposed fine-tuned LLM for pairwise causal discovery, effectively leveraging both knowledge and numerical information.

Is Knowledge All Large Language Models Needed for Causal Reasoning?

TL;DR

Abstract

Paper Structure (30 sections, 1 theorem, 9 equations, 10 figures, 8 tables)

This paper contains 30 sections, 1 theorem, 9 equations, 10 figures, 8 tables.

Introduction
Proposed Framework
Causal Attribution Model
Causal Discovery Task and Terminology
Experiment Design
Dataset Construction for Causal Reasoning
Optimal Prompt Training
Ability Attribution: Omit Knowledge
Ability Attribution: Omit Data
Ability Attribution: Random Guess
Pairwise Causal Discovery Task
Reverse Causal Discovery
Estimations of Proposed Attribution Scores and Evaluation Metrics
Experiment Results of Causal Attribution Model
Implementation and Evaluation Metrics
...and 15 more sections

Key Result

Theorem 3.1

JMLR:v7:shimizu06a In the linear non-Gaussian noise setting, if the true structural causal model is $Y: = f(X) + U,\ X\perp\!\!\!\!\perp U,$ then there does not exist a structural causal model in the reverse direction $X := g(Y) + \Tilde{U},\ \ \ Y\perp\!\!\!\!\perp \Tilde{U}$ that can generate da

Figures (10)

Figure 1: Ability attribution on answering the causal question by generating counterfactual examples.
Figure 2: Experiment design for LLMs' answering causal questions with encouraging prompts.
Figure 3: Illustration of the experiment design of the pairwise causal discovery task.
Figure 4: Illustration of the reverse causal discovery task.
Figure 5: The differences between attribution scores (MAK-MAD) among LLMs to demonstrate a hierarchy in their knowledge depth.
...and 5 more figures

Theorems & Definitions (5)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Theorem 3.1

Is Knowledge All Large Language Models Needed for Causal Reasoning?

TL;DR

Abstract

Is Knowledge All Large Language Models Needed for Causal Reasoning?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (5)