Table of Contents
Fetching ...

Unveiling and Causalizing CoT: A Causal Pespective

Jiarun Fu, Lizhong Ding, Hao Li, Pengqi Li, Qiuning Wei, Xu Chen

TL;DR

This work reframes Chain-of-Thought reasoning in LLMs as a causal process modeled by structural causal models (SCM). It defines metrics such as CoT Average Causal Effect (CACE) and First-Step Causal Effect (FSCE) to quantify causal links between CoT steps and the final answer, and introduces CauCoT, a two-stage, role-playing causalization algorithm that iteratively enforces correct causal relations across all reasoning steps. Across open- and closed-source LLMs on the PROCESSBENCH suite, CauCoT not only improves answer accuracy on complex problems but also increases the causal goodfulness of the reasoning chain by correcting common causal errors. The approach provides a principled, interpretable pathway to make CoT reasoning both correct and understandable, with potential broad impact on the reliability and transparency of LLM-based reasoning systems.

Abstract

Although Chain-of-Thought (CoT) has achieved remarkable success in enhancing the reasoning ability of large language models (LLMs), the mechanism of CoT remains a ``black box''. Even if the correct answers can frequently be obtained, existing CoTs struggle to make the reasoning understandable to human. In this paper, we unveil and causalize CoT from a causal perspective to ensure both correctness and understandability of all reasoning steps (to the best of our knowledge, the first such). We model causality of CoT via structural causal models (SCM) to unveil the reasoning mechanism of CoT. To measure the causality of CoT, we define the CoT Average Causal Effect (CACE) to test the causal relations between steps. For those steps without causality (wrong or unintelligible steps), we design a role-playing causal query algorithm to causalize these steps, resulting a causalized CoT with all steps correct and understandable. Experimental results on both open-source and closed-source LLMs demonstrate that the causal errors commonly in steps are effectively corrected and the reasoning ability of LLMs is significantly improved.

Unveiling and Causalizing CoT: A Causal Pespective

TL;DR

This work reframes Chain-of-Thought reasoning in LLMs as a causal process modeled by structural causal models (SCM). It defines metrics such as CoT Average Causal Effect (CACE) and First-Step Causal Effect (FSCE) to quantify causal links between CoT steps and the final answer, and introduces CauCoT, a two-stage, role-playing causalization algorithm that iteratively enforces correct causal relations across all reasoning steps. Across open- and closed-source LLMs on the PROCESSBENCH suite, CauCoT not only improves answer accuracy on complex problems but also increases the causal goodfulness of the reasoning chain by correcting common causal errors. The approach provides a principled, interpretable pathway to make CoT reasoning both correct and understandable, with potential broad impact on the reliability and transparency of LLM-based reasoning systems.

Abstract

Although Chain-of-Thought (CoT) has achieved remarkable success in enhancing the reasoning ability of large language models (LLMs), the mechanism of CoT remains a ``black box''. Even if the correct answers can frequently be obtained, existing CoTs struggle to make the reasoning understandable to human. In this paper, we unveil and causalize CoT from a causal perspective to ensure both correctness and understandability of all reasoning steps (to the best of our knowledge, the first such). We model causality of CoT via structural causal models (SCM) to unveil the reasoning mechanism of CoT. To measure the causality of CoT, we define the CoT Average Causal Effect (CACE) to test the causal relations between steps. For those steps without causality (wrong or unintelligible steps), we design a role-playing causal query algorithm to causalize these steps, resulting a causalized CoT with all steps correct and understandable. Experimental results on both open-source and closed-source LLMs demonstrate that the causal errors commonly in steps are effectively corrected and the reasoning ability of LLMs is significantly improved.

Paper Structure

This paper contains 32 sections, 13 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: We assume the ability of CoT to reason correctly stems from its reflection of real-world causal relationships. As shown in the figure above, in the real world, based on our life experience, we infer the causal graph between variables such as month, rainfall, sprinkler use, and pavement slipperiness, then use this graph to deduce the answer. Similarly, LLMs employ Chain-of-Thought (CoT) to perform a reasoning process that aligns with these real-world causal relationships, ultimately arriving at the correct answer.
  • Figure 2: From modeling CoT to causalizing CoT.
  • Figure 3: Causalized Evaluation on Qwen2.5-72B
  • Figure 4: Causalized Evaluation on Deepseek-v3-37B
  • Figure 5: Hyperparamter experiments
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2