Table of Contents
Fetching ...

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar, Saketh Bachu, Vineeth N Balasubramanian, Amit Sharma

TL;DR

This paper argues that the full causal graph is a brittle output when eliciting knowledge from imperfect experts such as LLMs and humans, and proposes causal order as a stable, agnostic interface for leveraging domain knowledge.It introduces the triplet prompting strategy, which uses an auxiliary variable for each pair and enforces acyclicity within triplets, combined with a majority-vote ensemble to produce a robust causal order with fewer cycles and lower cost than traditional pairwise prompting.The authors formalize the topological divergence D_top as a metric tied to valid backdoor adjustment and demonstrate that a correct causal order yields reliable effect estimation, while full graph accuracy (SHD) can remain high even for perfect experts.Extensive experiments on real-world BNLearn graphs show that triplet prompting outperforms pairwise prompting across LLMs and human annotators, and that the resulting causal order meaningfully improves downstream causal discovery and causal effect inference, especially in data-scarce settings.

Abstract

Large Language Models (LLMs) have been used as experts to infer causal graphs, often by repeatedly applying a pairwise prompt that asks about the causal relationship of each variable pair. However, such experts, including human domain experts, cannot distinguish between direct and indirect effects given a pairwise prompt. Therefore, instead of the graph, we propose that causal order be used as a more stable output interface for utilizing expert knowledge. Even when querying a perfect expert with a pairwise prompt, we show that the inferred graph can have significant errors whereas the causal order is always correct. In practice, however, LLMs are imperfect experts and we find that pairwise prompts lead to multiple cycles. Hence, we propose the triplet method, a novel querying strategy that introduces an auxiliary variable for every variable pair and instructs the LLM to avoid cycles within this triplet. It then uses a voting-based ensemble method that results in higher accuracy and fewer cycles while ensuring cost efficiency. Across multiple real-world graphs, such a triplet-based method yields a more accurate order than the pairwise prompt, using both LLMs and human annotators. The triplet method enhances robustness by repeatedly querying an expert with different auxiliary variables, enabling smaller models like Phi-3 and Llama-3 8B Instruct to surpass GPT-4 with pairwise prompting. For practical usage, we show how the expert-provided causal order from the triplet method can be used to reduce error in downstream graph discovery and effect inference tasks.

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

TL;DR

This paper argues that the full causal graph is a brittle output when eliciting knowledge from imperfect experts such as LLMs and humans, and proposes causal order as a stable, agnostic interface for leveraging domain knowledge.It introduces the triplet prompting strategy, which uses an auxiliary variable for each pair and enforces acyclicity within triplets, combined with a majority-vote ensemble to produce a robust causal order with fewer cycles and lower cost than traditional pairwise prompting.The authors formalize the topological divergence D_top as a metric tied to valid backdoor adjustment and demonstrate that a correct causal order yields reliable effect estimation, while full graph accuracy (SHD) can remain high even for perfect experts.Extensive experiments on real-world BNLearn graphs show that triplet prompting outperforms pairwise prompting across LLMs and human annotators, and that the resulting causal order meaningfully improves downstream causal discovery and causal effect inference, especially in data-scarce settings.

Abstract

Large Language Models (LLMs) have been used as experts to infer causal graphs, often by repeatedly applying a pairwise prompt that asks about the causal relationship of each variable pair. However, such experts, including human domain experts, cannot distinguish between direct and indirect effects given a pairwise prompt. Therefore, instead of the graph, we propose that causal order be used as a more stable output interface for utilizing expert knowledge. Even when querying a perfect expert with a pairwise prompt, we show that the inferred graph can have significant errors whereas the causal order is always correct. In practice, however, LLMs are imperfect experts and we find that pairwise prompts lead to multiple cycles. Hence, we propose the triplet method, a novel querying strategy that introduces an auxiliary variable for every variable pair and instructs the LLM to avoid cycles within this triplet. It then uses a voting-based ensemble method that results in higher accuracy and fewer cycles while ensuring cost efficiency. Across multiple real-world graphs, such a triplet-based method yields a more accurate order than the pairwise prompt, using both LLMs and human annotators. The triplet method enhances robustness by repeatedly querying an expert with different auxiliary variables, enabling smaller models like Phi-3 and Llama-3 8B Instruct to surpass GPT-4 with pairwise prompting. For practical usage, we show how the expert-provided causal order from the triplet method can be used to reduce error in downstream graph discovery and effect inference tasks.
Paper Structure (23 sections, 10 theorems, 6 equations, 14 figures, 36 tables, 2 algorithms)

This paper contains 23 sections, 10 theorems, 6 equations, 14 figures, 36 tables, 2 algorithms.

Key Result

Proposition 3.1

Let the true causal DAG be $\mathcal{G}(\mathbf{X}, \mathbf{E})$ with ground-truth adjacency matrix $A$. Consider a procedure to estimate a graph $\hat{G}$ by querying a Perfect Expert (as in Def. def:perfectexpert) with pairwise queries $X_i$, $X_j$ with auxiliary set $\mathbf{O}_{ij}$, followed by

Figures (14)

  • Figure 1: Cancer datasetbnlearn: Top: True causal graph. Bottom: Expert-estimated causal graph. Note that the latter, while not correct wrt. the true graph, yields the correct causal order.
  • Figure 2: Top: Using the pairwise prompt, even under a perfect expert (e.g., domain expert), the estimated graph may not be correct ($SHD=1$). Causal order, however, is correct ($D_{top}=0$) and hence a better metric. Bottom: under imperfect experts such as LLMs, pairwise prompts may not lead to valid order, creating cycles. The proposed triplet prompting strategy alleviates this issue to provide better estimates of causal order ($D_{top}=0$).
  • Figure 3: Variability of SHD for various graph sizes with $D_{top}=0$ within each graph.
  • Figure A1: Leveraging Causal Order from Imperfect Experts. Our triplet-based querying method infers all three-variable subgraphs from imperfect experts and aggregates them (using majority voting) to produce a causal order. Ties in causal order are broken using a high-cost expert. Expert-generated causal order is integrated with discovery algorithms, before estimating causal effect.
  • Figure : Integrating $\hat{\pi}$ in constraint-based methods
  • ...and 9 more figures

Theorems & Definitions (22)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4
  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Definition 4.1: $\epsilon$-Experts
  • Proposition 4.1
  • Definition B.1
  • ...and 12 more