Large Language Models for Constrained-Based Causal Discovery

Kai-Hendrik Cohrs; Gherardo Varando; Emiliano Diaz; Vasileios Sitokonstantinou; Gustau Camps-Valls

Large Language Models for Constrained-Based Causal Discovery

Kai-Hendrik Cohrs, Gherardo Varando, Emiliano Diaz, Vasileios Sitokonstantinou, Gustau Camps-Valls

TL;DR

This work investigates using large language models (LLMs) as conditional-independence (CI) oracles within the PC algorithm for causal discovery, framing CI tests as prompts to LLMs and aggregating multiple responses with a statistical voting scheme. The proposed chatPC pipeline enables a knowledge-driven alternative to data-driven CI testing, showing substantial variability across problems and models, with GPT-4 generally more consistent than GPT-3.5 and aggregation improving robustness. While not a perfect oracle, the LLM-based CI tests yield reasonably plausible causal graphs on several benchmark graphs and demonstrate potential as a complementary tool to data-driven methods, especially when data are scarce or when prior knowledge can be leveraged. The results also highlight the importance of controlling false positives/negatives via a principled aggregation mechanism and point to future directions, including retrieval-augmented grounding and hybrid data-knowledge integration, to enhance reliability and scalability.

Abstract

Causality is essential for understanding complex systems, such as the economy, the brain, and the climate. Constructing causal graphs often relies on either data-driven or expert-driven approaches, both fraught with challenges. The former methods, like the celebrated PC algorithm, face issues with data requirements and assumptions of causal sufficiency, while the latter demand substantial time and domain knowledge. This work explores the capabilities of Large Language Models (LLMs) as an alternative to domain experts for causal graph generation. We frame conditional independence queries as prompts to LLMs and employ the PC algorithm with the answers. The performance of the LLM-based conditional independence oracle on systems with known causal graphs shows a high degree of variability. We improve the performance through a proposed statistical-inspired voting schema that allows some control over false-positive and false-negative rates. Inspecting the chain-of-thought argumentation, we find causal reasoning to justify its answer to a probabilistic query. We show evidence that knowledge-based CIT could eventually become a complementary tool for data-driven causal discovery.

Large Language Models for Constrained-Based Causal Discovery

TL;DR

Abstract

Paper Structure (19 sections, 6 figures, 2 tables)

This paper contains 19 sections, 6 figures, 2 tables.

Introduction
Background
Data-Driven Causal Discovery
Causal Discovery with Prior Knowledge
Causality with LLMs
Conditional Independence Queries via LLM
Prompting for conditional independence testing
Testing
Evaluation
Permutation consistency
Performance of CIT
Inquiring spurious correlations
Application to Causal Discovery
Causal graphs from the examples
Conclusions
...and 4 more sections

Figures (6)

Figure 1: Illustration of the introduced scheme for PC with GPT/LLM. Credits: Little robot face by Antònia Font.
Figure 2: Confusion matrix of the model's responses to queries with changing order of $X$ and $Y$. UNCERTAIN outcomes in case of a tie in majority voting are hidden. The agreement score aggregates common YES, NO and UNCERTAIN outcomes.
Figure 3: Assumed true graph (a) and skeleton recovered (b) with the proposed chatPC approach for the burglary problem. Variables: Burglary in progress (B); earthquake (E); alarm ringing (A); Mary (M) or John (J) calling.
Figure 4: Assumed true graph (a) and skeleton recovered (b) with the proposed chatPC approach for the cancer problem. Variables: Patient is a smoker (S); patient exposed to pollution (P); patient suffers from lung cancer (C); positive results from a chest X-ray (X); patient is suffering from dyspnoea (D).
Figure 5: Assumed true graph (a) and skeleton recovered (b) with the proposed chatPC approach for the nao-dk-med problem. Variables: North-Atlantic Oscillation (NAO); summer precipitation in Denmark (DK); summer precipitation in the Mediterranean region (MED).
...and 1 more figures

Large Language Models for Constrained-Based Causal Discovery

TL;DR

Abstract

Large Language Models for Constrained-Based Causal Discovery

Authors

TL;DR

Abstract

Table of Contents

Figures (6)