Table of Contents
Fetching ...

ALCM: Autonomous LLM-Augmented Causal Discovery Framework

Elahe Khatibi, Mahyar Abbasian, Zhongqi Yang, Iman Azimi, Amir M. Rahmani

TL;DR

The paper tackles the NP-hard challenge of causal discovery in high-dimensional data by proposing ALCM, an autonomous framework that fuses traditional CCD methods with Large Language Models. It introduces three interacting components—a causal structure learning module, a causal wrapper for context-aware prompting, and an LLM-driven refiner for edge-level validation and discovery of hidden relations. Two concrete implementations, ALCM-PC and ALCM-Hybrid, demonstrate that dynamic weighting and LLM-based refinement markedly improve accuracy and interpretability across seven benchmark datasets. The results suggest that leveraging LLMs to augment, not replace, data-driven causality yields more robust and explainable causal graphs, with significant implications for domains requiring scalable and autonomous causal reasoning. Future work points to grounding these in knowledge graphs and retrieval-augmented setups to further mitigate hallucinations and enhance grounding.

Abstract

To perform effective causal inference in high-dimensional datasets, initiating the process with causal discovery is imperative, wherein a causal graph is generated based on observational data. However, obtaining a complete and accurate causal graph poses a formidable challenge, recognized as an NP- hard problem. Recently, the advent of Large Language Models (LLMs) has ushered in a new era, indicating their emergent capabilities and widespread applicability in facilitating causal reasoning across diverse domains, such as medicine, finance, and science. The expansive knowledge base of LLMs holds the potential to elevate the field of causal reasoning by offering interpretability, making inferences, generalizability, and uncovering novel causal structures. In this paper, we introduce a new framework, named Autonomous LLM-Augmented Causal Discovery Framework (ALCM), to synergize data-driven causal discovery algorithms and LLMs, automating the generation of a more resilient, accurate, and explicable causal graph. The ALCM consists of three integral components: causal structure learning, causal wrapper, and LLM-driven causal refiner. These components autonomously collaborate within a dynamic environment to address causal discovery questions and deliver plausible causal graphs. We evaluate the ALCM framework by implementing two demonstrations on seven well-known datasets. Experimental results demonstrate that ALCM outperforms existing LLM methods and conventional data-driven causal reasoning mechanisms. This study not only shows the effectiveness of the ALCM but also underscores new research directions in leveraging the causal reasoning capabilities of LLMs.

ALCM: Autonomous LLM-Augmented Causal Discovery Framework

TL;DR

The paper tackles the NP-hard challenge of causal discovery in high-dimensional data by proposing ALCM, an autonomous framework that fuses traditional CCD methods with Large Language Models. It introduces three interacting components—a causal structure learning module, a causal wrapper for context-aware prompting, and an LLM-driven refiner for edge-level validation and discovery of hidden relations. Two concrete implementations, ALCM-PC and ALCM-Hybrid, demonstrate that dynamic weighting and LLM-based refinement markedly improve accuracy and interpretability across seven benchmark datasets. The results suggest that leveraging LLMs to augment, not replace, data-driven causality yields more robust and explainable causal graphs, with significant implications for domains requiring scalable and autonomous causal reasoning. Future work points to grounding these in knowledge graphs and retrieval-augmented setups to further mitigate hallucinations and enhance grounding.

Abstract

To perform effective causal inference in high-dimensional datasets, initiating the process with causal discovery is imperative, wherein a causal graph is generated based on observational data. However, obtaining a complete and accurate causal graph poses a formidable challenge, recognized as an NP- hard problem. Recently, the advent of Large Language Models (LLMs) has ushered in a new era, indicating their emergent capabilities and widespread applicability in facilitating causal reasoning across diverse domains, such as medicine, finance, and science. The expansive knowledge base of LLMs holds the potential to elevate the field of causal reasoning by offering interpretability, making inferences, generalizability, and uncovering novel causal structures. In this paper, we introduce a new framework, named Autonomous LLM-Augmented Causal Discovery Framework (ALCM), to synergize data-driven causal discovery algorithms and LLMs, automating the generation of a more resilient, accurate, and explicable causal graph. The ALCM consists of three integral components: causal structure learning, causal wrapper, and LLM-driven causal refiner. These components autonomously collaborate within a dynamic environment to address causal discovery questions and deliver plausible causal graphs. We evaluate the ALCM framework by implementing two demonstrations on seven well-known datasets. Experimental results demonstrate that ALCM outperforms existing LLM methods and conventional data-driven causal reasoning mechanisms. This study not only shows the effectiveness of the ALCM but also underscores new research directions in leveraging the causal reasoning capabilities of LLMs.
Paper Structure (18 sections, 6 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 6 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: ALCM Architecture
  • Figure 2: Causal Prompt Demonstration
  • Figure 3: Prompt Template
  • Figure 4: Causal graphs Demonstrations
  • Figure 5: Additive Contribution on Causal Discovery Accuracy on Neuropathetic Pain
  • ...and 3 more figures