Not All Layers Need Tuning: Selective Layer Restoration Recovers Diversity
Bowen Zhang, Meiyi Wang, Harold Soh
TL;DR
This work tackles mode collapse in post-trained LLMs by proposing Selective Layer Restoration (SLR), a training-free method that restores a contiguous interval of layers to their pre-trained weights to recover diversity without sacrificing quality. To choose the restoration interval efficiently, the authors introduce Constrained Random Character (CRC), a proxy task with explicit validity sets that quantifies the diversity–quality trade-off and guides interval selection. Across three tasks (creative writing, open-ended QA, and multi-step reasoning) and three model families (Llama, Qwen, Gemma), SLR achieves substantial diversity gains with minimal quality loss and is shown to be complementary to decoding- and prompting-based diversification. The results support a modular view of post-trained LLMs, showing that maintaining diversity can be achieved through targeted weight-space interventions rather than full retraining, with limitations discussed for scaling and future refinements.
Abstract
Post-training improves instruction-following and helpfulness of large language models (LLMs) but often reduces generation diversity, which leads to repetitive outputs in open-ended settings, a phenomenon known as mode collapse. Motivated by evidence that LLM layers play distinct functional roles, we hypothesize that mode collapse can be localized to specific layers and that restoring a carefully chosen range of layers to their pre-trained weights can recover diversity while maintaining high output quality. To validate this hypothesis and decide which layers to restore, we design a proxy task -- Constrained Random Character(CRC) -- with an explicit validity set and a natural diversity objective. Results on CRC reveal a clear diversity-validity trade-off across restoration ranges and identify configurations that increase diversity with minimal quality loss. Based on these findings, we propose Selective Layer Restoration (SLR), a training-free method that restores selected layers in a post-trained model to their pre-trained weights, yielding a hybrid model with the same architecture and parameter count, incurring no additional inference cost. Across three different tasks (creative writing, open-ended question answering, and multi-step reasoning) and three different model families (Llama, Qwen, and Gemma), we find SLR can consistently and substantially improve output diversity while maintaining high output quality.
