A Survey on Enhancing Causal Reasoning Ability of Large Language Models
Xin Li, Zhuo Cai, Shoujin Wang, Kun Yu, Fang Chen
TL;DR
This survey addresses the gap in understanding how to enhance causal reasoning in large language models (LLMs) by proposing a taxonomy that splits methods into domain knowledge driven and model driven approaches. It details subcategories including domain experts, contextual knowledge, predefined prompts, fine-tuning, causal graph construction, causal effect estimation, and counterfactual reasoning, and compares their strengths and weaknesses. The paper compiles benchmarks and metrics such as $QRDATA$, $CLEAR$, $CLADDER$, $CausalProbe-2024$, $SHD$, $SID$, $CESAR$, and $CausalScore$ to standardize evaluation, and outlines future directions spanning multi-modal reasoning, memory mechanisms, self-learning, ethical alignment, and unified datasets. Overall, it provides a structured overview to guide researchers in evaluating and improving LLMs’ causal reasoning capabilities with a view toward real-world applicability and trustworthy AI.
Abstract
Large language models (LLMs) have recently shown remarkable performance in language tasks and beyond. However, due to their limited inherent causal reasoning ability, LLMs still face challenges in handling tasks that require robust causal reasoning ability, such as health-care and economic analysis. As a result, a growing body of research has focused on enhancing the causal reasoning ability of LLMs. Despite the booming research, there lacks a survey to well review the challenges, progress and future directions in this area. To bridge this significant gap, we systematically review literature on how to strengthen LLMs' causal reasoning ability in this paper. We start from the introduction of background and motivations of this topic, followed by the summarisation of key challenges in this area. Thereafter, we propose a novel taxonomy to systematically categorise existing methods, together with detailed comparisons within and between classes of methods. Furthermore, we summarise existing benchmarks and evaluation metrics for assessing LLMs' causal reasoning ability. Finally, we outline future research directions for this emerging field, offering insights and inspiration to researchers and practitioners in the area.
