Understanding Causality with Large Language Models: Feasibility and Opportunities
Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan
TL;DR
The paper evaluates how well large language models can answer causal questions, identifying strong performance on knowledge-based causal inquiries but significant gaps in discovering new causal relationships and making high-precision, high-stakes decisions. It argues that bridging these gaps requires integrating causal machine learning with LLMs, either through modular causal components or a new causality-aware training paradigm. The proposed directions aim to improve trust, efficiency, and applicability of LLMs in real-world causal reasoning across domains. If realized, these approaches could substantially broaden the impact of LLMs on science, industry, and decision making.
Abstract
We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decision-making tasks with high precision. We discuss possible future directions and opportunities, such as enabling explicit and implicit causal modules as well as deep causal-aware LLMs. These will not only enable LLMs to answer many different types of causal questions for greater impact but also enable LLMs to be more trustworthy and efficient in general.
