The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code
Xiao Liu, Da Yin, Chen Zhang, Yansong Feng, Dongyan Zhao
TL;DR
This work demonstrates that Code-LLMs, when guided by code-based prompts that explicitly encode causal structures through conditional constructs, outperform text-only LLMs in abductive and counterfactual reasoning tasks. The authors design prompts where causal relations are modeled via functions and conditionals, enabling clear, branch-aware reasoning and output generation. Automatic metrics and human judgments consistently favor Codex with code prompts over text-based prompts and prior methods, and interventions reveal that programming structure is the key driver of performance while format and language perturbations are largely tolerated. The findings highlight the potential of code-focused prompting to enhance causal reasoning in LLMs, with implications for building more reliable AI systems for reasoning-intensive tasks, while also noting limitations in language scope and the need for broader, scalable prompt engineering and model access.
Abstract
Causal reasoning, the ability to identify cause-and-effect relationship, is crucial in human thinking. Although large language models (LLMs) succeed in many NLP tasks, it is still challenging for them to conduct complex causal reasoning like abductive reasoning and counterfactual reasoning. Given the fact that programming code may express causal relations more often and explicitly with conditional statements like ``if``, we want to explore whether Code-LLMs acquire better causal reasoning abilities. Our experiments show that compared to text-only LLMs, Code-LLMs with code prompts are significantly better in causal reasoning. We further intervene on the prompts from different aspects, and discover that the programming structure is crucial in code prompt design, while Code-LLMs are robust towards format perturbations.
