Table of Contents
Fetching ...

Causal Inference with Large Language Model: A Survey

Jing Ma

TL;DR

The paper surveys the use of large language models (LLMs) for causal inference in natural language processing, outlining how LLMs can leverage domain knowledge, reasoning, and context to tackle causal tasks beyond traditional tabular data. It formalizes causality with structural causal models (SCM) and Pearl's ladder of causation, then categorizes LLM approaches into prompting, fine-tuning, hybrids with conventional causal methods, and knowledge augmentation. Across causal discovery, causal effect estimation, and other tasks like attribution, counterfactual reasoning, and explanation, the survey synthesizes datasets, evaluation results, and key insights—highlighting strong performance in pairwise discovery and more nuanced outcomes for higher-rung reasoning, dependent on prompting and model scale. The discussion points to opportunities and challenges, including integrating human knowledge, improving data generation and robustness, mitigating hallucinations, and developing causality-focused benchmarks and models with practical impact in high-stakes domains.

Abstract

Causal inference has been a pivotal challenge across diverse domains such as medicine and economics, demanding a complicated integration of human knowledge, mathematical reasoning, and data mining capabilities. Recent advancements in natural language processing (NLP), particularly with the advent of large language models (LLMs), have introduced promising opportunities for traditional causal inference tasks. This paper reviews recent progress in applying LLMs to causal inference, encompassing various tasks spanning different levels of causation. We summarize the main causal problems and approaches, and present a comparison of their evaluation results in different causal scenarios. Furthermore, we discuss key findings and outline directions for future research, underscoring the potential implications of integrating LLMs in advancing causal inference methodologies.

Causal Inference with Large Language Model: A Survey

TL;DR

The paper surveys the use of large language models (LLMs) for causal inference in natural language processing, outlining how LLMs can leverage domain knowledge, reasoning, and context to tackle causal tasks beyond traditional tabular data. It formalizes causality with structural causal models (SCM) and Pearl's ladder of causation, then categorizes LLM approaches into prompting, fine-tuning, hybrids with conventional causal methods, and knowledge augmentation. Across causal discovery, causal effect estimation, and other tasks like attribution, counterfactual reasoning, and explanation, the survey synthesizes datasets, evaluation results, and key insights—highlighting strong performance in pairwise discovery and more nuanced outcomes for higher-rung reasoning, dependent on prompting and model scale. The discussion points to opportunities and challenges, including integrating human knowledge, improving data generation and robustness, mitigating hallucinations, and developing causality-focused benchmarks and models with practical impact in high-stakes domains.

Abstract

Causal inference has been a pivotal challenge across diverse domains such as medicine and economics, demanding a complicated integration of human knowledge, mathematical reasoning, and data mining capabilities. Recent advancements in natural language processing (NLP), particularly with the advent of large language models (LLMs), have introduced promising opportunities for traditional causal inference tasks. This paper reviews recent progress in applying LLMs to causal inference, encompassing various tasks spanning different levels of causation. We summarize the main causal problems and approaches, and present a comparison of their evaluation results in different causal scenarios. Furthermore, we discuss key findings and outline directions for future research, underscoring the potential implications of integrating LLMs in advancing causal inference methodologies.
Paper Structure (16 sections, 2 figures, 4 tables)

This paper contains 16 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Representative causal tasks, their positions in the causal ladder, and examples of prompts w.r.t. mode, question type, and prompting strategy. PCD = pairwise causal discovery; CA=causal attribution; ATE=average treatment effect; CDE=controlled direct effect; BAJ=backdoor adjustment; CE=causal explanation; CR=counterfactual reasoning; NDE=natural direct effect.
  • Figure 2: The major causal tasks and LLMs evaluated for these tasks.