Table of Contents
Fetching ...

Scaling Competence, Shrinking Reasoning: Cognitive Signatures in Language Model Learning

Mukul Singh, Ananya Singha, Arjun Radhakrishna, Sumit Gulwani

TL;DR

The paper investigates how reasoning traces (reasoning tokens) evolve during task-specific fine-tuning of reasoning systems, framing them as a working-memory analogue governed by the Four Stages of Competence. Through reinforcement-learning experiments across code-generation, math, regex synthesis, and logic tasks, it shows that reasoning-token length grows with performance, peaks at conscious competence, and then declines as knowledge becomes internalized, with task performance persisting even when reasoning is removed. It further proposes learning-stage metrics to diagnose training progress and guide early stopping, arguing that reasoning traces act as a valuable scaffold during acquisition. The findings offer a cognitive-science-informed lens for understanding model learning dynamics and provide practical signals to optimize training regimes for reasoning capabilities.

Abstract

We analyze reasoning in language models during task-specific fine-tuning and draws parallel between reasoning tokens--intermediate steps generated while solving problem and the human working memory. Drawing from cognitive science, we align training dynamics with the Four Stages of Competence: models initially produce incorrect outputs without reasoning, then begin reasoning (but still fail), eventually reason effectively, and finally solve tasks without explicit reasoning. We find that reasoning token length expands as performance improves, peaks at the stage of conscious competence, then declines as the model internalizes the task. Notably, after training, models retain performance even when reasoning is removed--suggesting it scaffolded learning but is no longer needed. This progression offers actionable insights: reasoning token dynamics can serve as a signal for diagnosing training stage, identifying convergence, and guiding early stopping. We propose metrics to track this trajectory and argue that reasoning behavior is valuable for understanding and optimizing reasoning model training.

Scaling Competence, Shrinking Reasoning: Cognitive Signatures in Language Model Learning

TL;DR

The paper investigates how reasoning traces (reasoning tokens) evolve during task-specific fine-tuning of reasoning systems, framing them as a working-memory analogue governed by the Four Stages of Competence. Through reinforcement-learning experiments across code-generation, math, regex synthesis, and logic tasks, it shows that reasoning-token length grows with performance, peaks at conscious competence, and then declines as knowledge becomes internalized, with task performance persisting even when reasoning is removed. It further proposes learning-stage metrics to diagnose training progress and guide early stopping, arguing that reasoning traces act as a valuable scaffold during acquisition. The findings offer a cognitive-science-informed lens for understanding model learning dynamics and provide practical signals to optimize training regimes for reasoning capabilities.

Abstract

We analyze reasoning in language models during task-specific fine-tuning and draws parallel between reasoning tokens--intermediate steps generated while solving problem and the human working memory. Drawing from cognitive science, we align training dynamics with the Four Stages of Competence: models initially produce incorrect outputs without reasoning, then begin reasoning (but still fail), eventually reason effectively, and finally solve tasks without explicit reasoning. We find that reasoning token length expands as performance improves, peaks at the stage of conscious competence, then declines as the model internalizes the task. Notably, after training, models retain performance even when reasoning is removed--suggesting it scaffolded learning but is no longer needed. This progression offers actionable insights: reasoning token dynamics can serve as a signal for diagnosing training stage, identifying convergence, and guiding early stopping. We propose metrics to track this trajectory and argue that reasoning behavior is valuable for understanding and optimizing reasoning model training.

Paper Structure

This paper contains 19 sections, 4 figures.

Figures (4)

  • Figure 1: A map of cognitive memory structure.
  • Figure 2: Distribution of the reasoning length (# of tokens) with the task accuracy.
  • Figure 3: Distribution of reasoning and answer correctness across different training steps
  • Figure 4: Performance of model on the training and held out datasets along with the length of reasoning length.