Table of Contents
Fetching ...

Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning

Zijun Chen, Wenbo Hu, Richang Hong

TL;DR

This work tackles the unreliability of Chain-of-Thought reasoning caused by error accumulation by exploiting truthfulness-sensitive activations found in middle Transformer layers. It introduces a probing-based approach to identify attention heads that reflect the veracity of intermediate steps, and trains a calibration-friendly confidence predictor using a multi-head activation representation. The predictor guides a confidence-aware beam search to select high-quality reasoning steps, combining predictor confidence with generation probabilities via a balanced scoring function. Results show substantial improvements over state-of-the-art CoT decoding methods across unimodal and multimodal tasks and model scales, and reveal a compatible self-correction mechanism, suggesting broad practical gains for reliable multi-step reasoning.

Abstract

Chain of Thought (CoT) reasoning has demonstrated remarkable deep reasoning capabilities in both large language models (LLMs) and multimodal large language models (MLLMs). However, its reliability is often undermined by the accumulation of errors in intermediate steps. This paper introduces an novel approach to calibrate the CoT reasoning accuracy by leveraging the model's intrinsic veracity encoding. We discover that specific attention head activations reliably reflect the truthfulness of reasoning steps in CoT. Based on this insight, we train a confidence predictor to evaluate the correctness of each reasoning step using these truthfulness-sensitive activations, dynamically selecting the most plausible reasoning path via beam search. Experimental results demonstrate that our method significantly outperforms the state-of-the-art baselines (e.g., Few-Shot CoT, Self-Consistency, and Self-Evaluation Guided Beam Search) across the mathematical, symbolic, and commonsense reasoning tasks, exhibiting superior accuracy and reliability in both unimodal and multimodal settings. We further validate the approach on large reasoning models, confirming its applicability to specialized reasoning models. Additionally, we explore the role of the model's self-correction ability in CoT reasoning. This work provides a novel reliability improvement path for CoT reasoning with broad application potential.

Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning

TL;DR

This work tackles the unreliability of Chain-of-Thought reasoning caused by error accumulation by exploiting truthfulness-sensitive activations found in middle Transformer layers. It introduces a probing-based approach to identify attention heads that reflect the veracity of intermediate steps, and trains a calibration-friendly confidence predictor using a multi-head activation representation. The predictor guides a confidence-aware beam search to select high-quality reasoning steps, combining predictor confidence with generation probabilities via a balanced scoring function. Results show substantial improvements over state-of-the-art CoT decoding methods across unimodal and multimodal tasks and model scales, and reveal a compatible self-correction mechanism, suggesting broad practical gains for reliable multi-step reasoning.

Abstract

Chain of Thought (CoT) reasoning has demonstrated remarkable deep reasoning capabilities in both large language models (LLMs) and multimodal large language models (MLLMs). However, its reliability is often undermined by the accumulation of errors in intermediate steps. This paper introduces an novel approach to calibrate the CoT reasoning accuracy by leveraging the model's intrinsic veracity encoding. We discover that specific attention head activations reliably reflect the truthfulness of reasoning steps in CoT. Based on this insight, we train a confidence predictor to evaluate the correctness of each reasoning step using these truthfulness-sensitive activations, dynamically selecting the most plausible reasoning path via beam search. Experimental results demonstrate that our method significantly outperforms the state-of-the-art baselines (e.g., Few-Shot CoT, Self-Consistency, and Self-Evaluation Guided Beam Search) across the mathematical, symbolic, and commonsense reasoning tasks, exhibiting superior accuracy and reliability in both unimodal and multimodal settings. We further validate the approach on large reasoning models, confirming its applicability to specialized reasoning models. Additionally, we explore the role of the model's self-correction ability in CoT reasoning. This work provides a novel reliability improvement path for CoT reasoning with broad application potential.

Paper Structure

This paper contains 16 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Demonstrate of the dissociation between surface-level generation probabilities and latent cognition in reasoning chains. The internal activations contain hidden representations of true information, and these sensitive activations are primarily concentrated in the intermediate layers.
  • Figure 2: An overview of our method. Using the binary-labeled data to train a confidence predictor, and then introducing this predictor into the CoT reasoning process to select multiple generated candidates with the highest confidence.
  • Figure 3: This figure shows the binary dataset of the constructed CoT.
  • Figure 4: This figure shows the calibration curves of LLaMA2-13B-Chat in SciQ, where closer proximity to the Ideal Calibration curve indicates better calibration.
  • Figure 5: This figure demonstrates the effect (LLaMA2-13B) of confidence-guided reasoning.