Table of Contents
Fetching ...

LLMs Faithfully and Iteratively Compute Answers During CoT: A Systematic Analysis With Multi-step Arithmetics

Keito Kudo, Yoichi Aoki, Tatsuki Kuribayashi, Shusaku Sone, Masaya Taniguchi, Ana Brassard, Keisuke Sakaguchi, Kentaro Inui

Abstract

This study investigates the internal information flow of large language models (LLMs) while performing chain-of-thought (CoT) style reasoning. Specifically, with a particular interest in the faithfulness of the CoT explanation to LLMs' final answer, we explore (i) when the LLMs' answer is (pre)determined, especially before the CoT begins or after, and (ii) how strongly the information from CoT specifically has a causal effect on the final answer. Our experiments with controlled arithmetic tasks reveal a systematic internal reasoning mechanism of LLMs. They have not derived an answer at the moment when input was fed into the model. Instead, they compute (sub-)answers while generating the reasoning chain on the fly. Therefore, the generated reasoning chains can be regarded as faithful reflections of the model's internal computation.

LLMs Faithfully and Iteratively Compute Answers During CoT: A Systematic Analysis With Multi-step Arithmetics

Abstract

This study investigates the internal information flow of large language models (LLMs) while performing chain-of-thought (CoT) style reasoning. Specifically, with a particular interest in the faithfulness of the CoT explanation to LLMs' final answer, we explore (i) when the LLMs' answer is (pre)determined, especially before the CoT begins or after, and (ii) how strongly the information from CoT specifically has a causal effect on the final answer. Our experiments with controlled arithmetic tasks reveal a systematic internal reasoning mechanism of LLMs. They have not derived an answer at the moment when input was fed into the model. Instead, they compute (sub-)answers while generating the reasoning chain on the fly. Therefore, the generated reasoning chains can be regarded as faithful reflections of the model's internal computation.

Paper Structure

This paper contains 48 sections, 3 equations, 111 figures, 23 tables.

Figures (111)

  • Figure 1: Using linear probes, we investigated at which time during the LLM's problem-solving process it is possible to determine the values of each variable, illustrating the model's problem-solving process. Our analysis indicates that LLMs come up with (sub-)answers during CoT. This conclusion is also consistent with the findings from the causal experiments in \ref{['sec:intervention']}.
  • Figure 2: Probing results for Qwen2.5-7B at the Level 3 task. The heatmaps in the lower section represent the accuracy of probes computed on the evaluation set. Each cell shows the probing accuracies in each token $t$, layer $l$. The upper part indicates the maximum probing accuracy achieved at each token position $t$. The input sequence below the line graphs is just an example; in the actual evaluation set, each variable name, number, and operator are randomly sampled from $(D, \Sigma, \{+, -\})$.
  • Figure 3: Error analysis of cases where Llama3.2-3B generated incorrect answers. The vertical axis represents the index of the transformer layer. The horizontal axis represents the tokens input to the model over time. The numbers highlighted in green represent the gold labels for the predictions, while those highlighted in red denote the values incorrectly generated by the model.
  • Figure 4: Overview of the causal intervention experiment. First, we perform normal inference (Clean run) and cache its hidden states. Subsequently, we evaluate whether the output changes by replacing some of the hidden states of a model solving a different problem with the cached hidden states.
  • Figure 5: Results of the causal intervention on Qwen2.5-7B. Each grid cell shows the success rate when the final answer $y$$({\underline{\textcolor{gray}{\mathrm{A}=}\textbf{6}}}_{\space\mathbf{5}})$ is the target token.
  • ...and 106 more figures