The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction
Yihuai Hong, Dian Zhou, Meng Cao, Lei Yu, Zhijing Jin
TL;DR
This work identifies Linear Reasoning Features (LiReFs) as linear directions in the residual stream of decoder-only transformers that mediate the balance between reasoning and memorization in LLMs. LiReFs are extracted via a difference of means between reasoning- and memory-oriented inputs, enabling both diagnostic visualization and causal intervention; during inference, adding or ablating along the LiReF direction with a scalar α shifts the model toward more generalizable reasoning or memorization. Across four base models and six datasets, LiReFs consistently separate reasoning from memory and correlate with reasoning generalizability (e.g., between reasoning scores and LiReF activation). Inference-time LiReF interventions yield improved accuracy on reasoning tasks and reduced misapplication of memory-based approaches, suggesting a mechanistic, transferable control knob for robust and interpretable generative reasoning in LLMs. The results point to a principled path toward more predictable and controllable AI systems leveraging the internal geometry of activation spaces.
Abstract
Large language models (LLMs) excel on a variety of reasoning benchmarks, but previous studies suggest they sometimes struggle to generalize to unseen questions, potentially due to over-reliance on memorized training examples. However, the precise conditions under which LLMs switch between reasoning and memorization during text generation remain unclear. In this work, we provide a mechanistic understanding of LLMs' reasoning-memorization dynamics by identifying a set of linear features in the model's residual stream that govern the balance between genuine reasoning and memory recall. These features not only distinguish reasoning tasks from memory-intensive ones but can also be manipulated to causally influence model performance on reasoning tasks. Additionally, we show that intervening in these reasoning features helps the model more accurately activate the most relevant problem-solving capabilities during answer generation. Our findings offer new insights into the underlying mechanisms of reasoning and memory in LLMs and pave the way for the development of more robust and interpretable generative AI systems.
