The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?
Zihan Pengmei, Costas Mavromatis, Zhengyuan Shen, Yunyi Zhang, Vassilis N. Ioannidis, Huzefa Rangwala
TL;DR
This study investigates how chain-of-thought (CoT) supervision shapes learning dynamics in transformers using a controlled set of symbolic reasoning tasks and a grokking framework. By contrasting direct-answer and CoT-guided training, it shows that CoT can accelerate generalization and enlarge expressivity on simpler tasks, but may not overcome high algorithmic complexity such as Intersection; it also uncovers a transient unfaithfulness phase where traces diverge from answers before alignment. The authors formalize a three-parameter logistic model for learning curves and an Arrhenius-like second-order framework to interpret how task complexity and data distribution affect learning rates, with CoT effectively lowering learning barriers. Mechanistic analyses via linear probing and causal tracing reveal that CoT shifts computation to earlier, distributed representations and alters causal pathways. These findings illuminate both the potential and the limits of CoT for improving reasoning in transformers and highlight the need for caution when using generated traces as explanations, especially under complex tasks or limited training regimes.
Abstract
Chain-of-thought (CoT) supervision can substantially improve transformer performance, yet the mechanisms by which models learn to follow and benefit from CoT remain poorly understood. We investigate these learning dynamics through the lens of grokking by pretraining transformers on symbolic reasoning tasks with tunable algorithmic complexity and controllable data composition to study their generalization. Models were trained under two settings: (i) producing only final answers, and (ii) emitting explicit CoT traces before answering. Our results show that while CoT generally improves task performance, its benefits depend on task complexity. To quantify these effects, we model the accuracy of the logarithmic training steps with a three-parameter logistic curve, revealing how the learning speed and shape vary with task complexity, data distribution, and the presence of CoT supervision. We also uncover a transient trace unfaithfulness phase: early in training, models often produce correct answers while skipping or contradicting CoT steps, before later aligning their reasoning traces with answers. Empirically, we (1) demonstrate that CoT accelerates generalization but does not overcome tasks with higher algorithmic complexity, such as finding list intersections; (2) introduce a kinetic modeling framework for understanding transformer learning; (3) characterize trace faithfulness as a dynamic property that emerges over training; and (4) show CoT alters internal transformer computation mechanistically.
