Table of Contents
Fetching ...

Leveraging Large Language Models for Causal Discovery: a Constraint-based, Argumentation-driven Approach

Zihao Li, Fabrizio Russo

TL;DR

This work explores the use of large language models (LLMs) as imperfect experts for Causal ABA, eliciting semantic structural priors from variable names and descriptions and integrating them with conditional-independence evidence.

Abstract

Causal discovery seeks to uncover causal relations from data, typically represented as causal graphs, and is essential for predicting the effects of interventions. While expert knowledge is required to construct principled causal graphs, many statistical methods have been proposed to leverage observational data with varying formal guarantees. Causal Assumption-based Argumentation (ABA) is a framework that uses symbolic reasoning to ensure correspondence between input constraints and output graphs, while offering a principled way to combine data and expertise. We explore the use of large language models (LLMs) as imperfect experts for Causal ABA, eliciting semantic structural priors from variable names and descriptions and integrating them with conditional-independence evidence. Experiments on standard benchmarks and semantically grounded synthetic graphs demonstrate state-of-the-art performance, and we additionally introduce an evaluation protocol to mitigate memorisation bias when assessing LLMs for causal discovery.

Leveraging Large Language Models for Causal Discovery: a Constraint-based, Argumentation-driven Approach

TL;DR

This work explores the use of large language models (LLMs) as imperfect experts for Causal ABA, eliciting semantic structural priors from variable names and descriptions and integrating them with conditional-independence evidence.

Abstract

Causal discovery seeks to uncover causal relations from data, typically represented as causal graphs, and is essential for predicting the effects of interventions. While expert knowledge is required to construct principled causal graphs, many statistical methods have been proposed to leverage observational data with varying formal guarantees. Causal Assumption-based Argumentation (ABA) is a framework that uses symbolic reasoning to ensure correspondence between input constraints and output graphs, while offering a principled way to combine data and expertise. We explore the use of large language models (LLMs) as imperfect experts for Causal ABA, eliciting semantic structural priors from variable names and descriptions and integrating them with conditional-independence evidence. Experiments on standard benchmarks and semantically grounded synthetic graphs demonstrate state-of-the-art performance, and we additionally introduce an evaluation protocol to mitigate memorisation bias when assessing LLMs for causal discovery.
Paper Structure (34 sections, 16 figures, 4 tables)

This paper contains 34 sections, 16 figures, 4 tables.

Figures (16)

  • Figure 1: LLM Integration Pipeline: Given a set of variables with their names and (possibly) descriptions, we prompt an LLM to generate pairwise causal statements. These statements are then parsed into structured assumptions for Causal ABA, which combines them with data-derived independence or arrow constraints and background knowledge to infer a set of causal graphs. Expert knowledge can be injected in the LLM prompts or as defeasible facts. Variables descriptions and LLM parsing are optional components of the pipeline but enhance the quality of the generated assumptions. Detailed prompts and parsing rules are provided in Appendix \ref{['sec:appendix_prompts']}.
  • Figure 2: Synthetic Evaluation Protocol: We generate random DAGs and ground them in CauseNet by finding sub-graph isomorphisms. Given that there might be many possible isomorphisms, we use a heuristic composed of three scores to select the most suitable one. The output is a semantically grounded DAG with variable names that can be used to evaluate the robustness of LLMs in generating causal assumptions.
  • Figure 3: Normalised Structural Hamming Distance (SHD, left-axis) and F1-score (right-axis) of LLM-augmented Causal ABA against baselines across synthetic datasets generated from the CauseNet Knowledge Graph (see Section \ref{['sec:eval_strategy']}) grouped by number of nodes ($|\mathbf{V}| \in \{5,10,15\}$). Error bars are standard deviations over 50 repetitions.
  • Figure 4: CauseNet Synthetic: Heatmap showing how the quality of LLM-derived and data-derived constraints relates to changes in final graph reconstruction accuracy after integration ($\Delta$F1 against the true DAG, in brackets the number of constraints $n$).
  • Figure 5: Examples of semantically grounded DAGs from our synthetic data pipeline, generated with different structural methods (ER, SF, LT) and semantic heuristics (none, degrees, semantics).
  • ...and 11 more figures

Theorems & Definitions (1)

  • Example 3.1