Reason to Rote: Rethinking Memorization in Reasoning
Yupei Du, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko, Barbara Plank
TL;DR
The paper addresses how language models can memorize noisy training labels while maintaining generalizable reasoning. It employs two controlled synthetic tasks, Four-Digit Addition and Two-Hop Relational Reasoning, to dissect the interaction between memorization and generalization, revealing that memorization relies on existing reasoning mechanisms and distributed encodings rather than simple look-up. Key findings include a two-phase learning dynamic (generalize then memorize), strong overlap between generalization and memorization circuits, and a neuron-level mechanism in FDA termed outlier heuristics. The results illuminate benign memorization and show how reasoning components adapt to accommodate noisy data, with implications for understanding implicit regularization and designing robust models. Overall, memorization does not override reasoning but subtly reshapes it through distributed, architecture-aligned adaptations.
Abstract
Large language models readily memorize arbitrary training instances, such as label noise, yet they perform strikingly well on reasoning tasks. In this work, we investigate how language models memorize label noise, and why such memorization in many cases does not heavily affect generalizable reasoning capabilities. Using two controllable synthetic reasoning datasets with noisy labels, four-digit addition (FDA) and two-hop relational reasoning (THR), we discover a reliance of memorization on generalizable reasoning mechanisms: models continue to compute intermediate reasoning outputs even when retrieving memorized noisy labels, and intervening reasoning adversely affects memorization. We further show that memorization operates through distributed encoding, i.e., aggregating various inputs and intermediate results, rather than building a look-up mechanism from inputs to noisy labels. Moreover, our FDA case study reveals memorization occurs via outlier heuristics, where existing neuron activation patterns are slightly shifted to fit noisy labels. Together, our findings suggest that memorization of label noise in language models builds on, rather than overrides, the underlying reasoning mechanisms, shedding lights on the intriguing phenomenon of benign memorization.
