Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning
Zijun Chen, Wenbo Hu, Richang Hong
TL;DR
This work tackles the unreliability of Chain-of-Thought reasoning caused by error accumulation by exploiting truthfulness-sensitive activations found in middle Transformer layers. It introduces a probing-based approach to identify attention heads that reflect the veracity of intermediate steps, and trains a calibration-friendly confidence predictor using a multi-head activation representation. The predictor guides a confidence-aware beam search to select high-quality reasoning steps, combining predictor confidence with generation probabilities via a balanced scoring function. Results show substantial improvements over state-of-the-art CoT decoding methods across unimodal and multimodal tasks and model scales, and reveal a compatible self-correction mechanism, suggesting broad practical gains for reliable multi-step reasoning.
Abstract
Chain of Thought (CoT) reasoning has demonstrated remarkable deep reasoning capabilities in both large language models (LLMs) and multimodal large language models (MLLMs). However, its reliability is often undermined by the accumulation of errors in intermediate steps. This paper introduces an novel approach to calibrate the CoT reasoning accuracy by leveraging the model's intrinsic veracity encoding. We discover that specific attention head activations reliably reflect the truthfulness of reasoning steps in CoT. Based on this insight, we train a confidence predictor to evaluate the correctness of each reasoning step using these truthfulness-sensitive activations, dynamically selecting the most plausible reasoning path via beam search. Experimental results demonstrate that our method significantly outperforms the state-of-the-art baselines (e.g., Few-Shot CoT, Self-Consistency, and Self-Evaluation Guided Beam Search) across the mathematical, symbolic, and commonsense reasoning tasks, exhibiting superior accuracy and reliability in both unimodal and multimodal settings. We further validate the approach on large reasoning models, confirming its applicability to specialized reasoning models. Additionally, we explore the role of the model's self-correction ability in CoT reasoning. This work provides a novel reliability improvement path for CoT reasoning with broad application potential.
