The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Pratyusha Sharma, Jordan T. Ash, Dipendra Misra
TL;DR
The paper shows that selective, post-training rank reductions in Transformer weight matrices—especially in late-layer MLPs—can surprisingly improve reasoning and factual accuracy without additional data or training. By replacing selected matrices with their low-rank approximations, LASER acts as a denoising mechanism that suppresses noisy higher-order components while preserving useful low-order information. The approach yields substantial gains on CounterFact and related NLP benchmarks, generalizes across models and even extends to non-text domains like reinforcement learning tasks, though it can modestly worsen language modeling perplexity. These findings challenge the notion that more parameters and data are always beneficial and offer a training-free path to enhance reasoning in large language models. They also raise questions about how higher-order components encode information and why later-layer MLPs are particularly amenable to improvement, pointing to future work on model internals and cross-architecture effects.
Abstract
Transformer-based Large Language Models (LLMs) have become a fixture in modern machine learning. Correspondingly, significant resources are allocated towards research that aims to further advance this technology, typically resulting in models of increasing size that are trained on increasing amounts of data. This work, however, demonstrates the surprising result that it is often possible to significantly improve the performance of LLMs by selectively removing higher-order components of their weight matrices. This simple intervention, which we call LAyer-SElective Rank reduction (LASER), can be done on a model after training has completed, and requires no additional parameters or data. We show extensive experiments demonstrating the generality of this finding across language models and datasets, and provide in-depth analyses offering insights into both when LASER is effective and the mechanism by which it operates.
