Large Language Models for Causal Discovery: Current Landscape and Future Directions

Guangya Wan; Yunsheng Lu; Yuqi Wu; Mengxuan Hu; Sheng Li

Large Language Models for Causal Discovery: Current Landscape and Future Directions

Guangya Wan, Yunsheng Lu, Yuqi Wu, Mengxuan Hu, Sheng Li

TL;DR

This paper surveys the intersection of Large Language Models (LLMs) and causal discovery (CD), arguing that LLMs can act as scalable meta-experts to accelerate structure learning, refine causal graphs, and inject domain knowledge. It categorizes LLM contributions into three roles—direct inference from metadata, posterior correction of statistically derived structures, and priors to guide traditional CD methods—and surveys benchmarks, datasets, and real-world applications. While LLMs show promise for rapid hypothesis generation and knowledge integration, the paper also acknowledges limitations such as hallucinations and reliance on priors, and it calls for unified evaluation protocols and SCM-focused validation. By outlining a framework for evaluation and future directions, the work aims to democratize CD, enable domain-specific expert agents, and strengthen the reliability of causality reasoning across fields like healthcare, finance, and engineering.

Abstract

Causal discovery (CD) and Large Language Models (LLMs) have emerged as transformative fields in artificial intelligence that have evolved largely independently. While CD specializes in uncovering cause-effect relationships from data, and LLMs excel at natural language processing and generation, their integration presents unique opportunities for advancing causal understanding. This survey examines how LLMs are transforming CD across three key dimensions: direct causal extraction from text, integration of domain knowledge into statistical methods, and refinement of causal structures. We systematically analyze approaches that leverage LLMs for CD tasks, highlighting their innovative use of metadata and natural language for causal inference. Our analysis reveals both LLMs' potential to enhance traditional CD methods and their current limitations as imperfect expert systems. We identify key research gaps, outline evaluation frameworks and benchmarks for LLM-based causal discovery, and advocate future research efforts for leveraging LLMs in causality research. As the first comprehensive examination of the synergy between LLMs and CD, this work lays the groundwork for future advances in the field.

Large Language Models for Causal Discovery: Current Landscape and Future Directions

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 4 figures, 2 tables)

This paper contains 19 sections, 3 equations, 4 figures, 2 tables.

Introduction
Background
Graphical Models
Causal Discovery
Tasks and Evaluation Metrics
Large Language Model
LLMs for Causal Discovery
LLMs as Direct Inference
LLMs as Posterior Correction
LLMs as Prior Knowledge
Evaluations and Applications
Benchmarks and Datasets
Applications
Challenges and Visions
Unified Evaluation and Fair Comparison
...and 4 more sections

Figures (4)

Figure 1: Overview of LLM-based causal discovery methods, categorized by their role: direct evaluation, prior knowledge augmentation, and post-hoc refinement. Evaluation strategies and future directions are also outlined.
Figure 2: Two Markov equivalent DAGs
Figure 3: Three distinct approaches for applying LLMs in causal discovery: (a) Direct causal inference without observational data, (b) Post refinement of statistically derived causal structures, and (c) Integration of prior knowledge into traditional statistical methods. The figure highlights the increasing automation and precision of causal discovery through LLMs, reducing the need for manual expert input.
Figure 4: General Workflow of Applying LLM on Conditional Independence Tests to Solve Constraint-Based Discovery Tasks.

Large Language Models for Causal Discovery: Current Landscape and Future Directions

TL;DR

Abstract

Large Language Models for Causal Discovery: Current Landscape and Future Directions

Authors

TL;DR

Abstract

Table of Contents

Figures (4)