Table of Contents
Fetching ...

Not All Layers Need Tuning: Selective Layer Restoration Recovers Diversity

Bowen Zhang, Meiyi Wang, Harold Soh

TL;DR

This work tackles mode collapse in post-trained LLMs by proposing Selective Layer Restoration (SLR), a training-free method that restores a contiguous interval of layers to their pre-trained weights to recover diversity without sacrificing quality. To choose the restoration interval efficiently, the authors introduce Constrained Random Character (CRC), a proxy task with explicit validity sets that quantifies the diversity–quality trade-off and guides interval selection. Across three tasks (creative writing, open-ended QA, and multi-step reasoning) and three model families (Llama, Qwen, Gemma), SLR achieves substantial diversity gains with minimal quality loss and is shown to be complementary to decoding- and prompting-based diversification. The results support a modular view of post-trained LLMs, showing that maintaining diversity can be achieved through targeted weight-space interventions rather than full retraining, with limitations discussed for scaling and future refinements.

Abstract

Post-training improves instruction-following and helpfulness of large language models (LLMs) but often reduces generation diversity, which leads to repetitive outputs in open-ended settings, a phenomenon known as mode collapse. Motivated by evidence that LLM layers play distinct functional roles, we hypothesize that mode collapse can be localized to specific layers and that restoring a carefully chosen range of layers to their pre-trained weights can recover diversity while maintaining high output quality. To validate this hypothesis and decide which layers to restore, we design a proxy task -- Constrained Random Character(CRC) -- with an explicit validity set and a natural diversity objective. Results on CRC reveal a clear diversity-validity trade-off across restoration ranges and identify configurations that increase diversity with minimal quality loss. Based on these findings, we propose Selective Layer Restoration (SLR), a training-free method that restores selected layers in a post-trained model to their pre-trained weights, yielding a hybrid model with the same architecture and parameter count, incurring no additional inference cost. Across three different tasks (creative writing, open-ended question answering, and multi-step reasoning) and three different model families (Llama, Qwen, and Gemma), we find SLR can consistently and substantially improve output diversity while maintaining high output quality.

Not All Layers Need Tuning: Selective Layer Restoration Recovers Diversity

TL;DR

This work tackles mode collapse in post-trained LLMs by proposing Selective Layer Restoration (SLR), a training-free method that restores a contiguous interval of layers to their pre-trained weights to recover diversity without sacrificing quality. To choose the restoration interval efficiently, the authors introduce Constrained Random Character (CRC), a proxy task with explicit validity sets that quantifies the diversity–quality trade-off and guides interval selection. Across three tasks (creative writing, open-ended QA, and multi-step reasoning) and three model families (Llama, Qwen, Gemma), SLR achieves substantial diversity gains with minimal quality loss and is shown to be complementary to decoding- and prompting-based diversification. The results support a modular view of post-trained LLMs, showing that maintaining diversity can be achieved through targeted weight-space interventions rather than full retraining, with limitations discussed for scaling and future refinements.

Abstract

Post-training improves instruction-following and helpfulness of large language models (LLMs) but often reduces generation diversity, which leads to repetitive outputs in open-ended settings, a phenomenon known as mode collapse. Motivated by evidence that LLM layers play distinct functional roles, we hypothesize that mode collapse can be localized to specific layers and that restoring a carefully chosen range of layers to their pre-trained weights can recover diversity while maintaining high output quality. To validate this hypothesis and decide which layers to restore, we design a proxy task -- Constrained Random Character(CRC) -- with an explicit validity set and a natural diversity objective. Results on CRC reveal a clear diversity-validity trade-off across restoration ranges and identify configurations that increase diversity with minimal quality loss. Based on these findings, we propose Selective Layer Restoration (SLR), a training-free method that restores selected layers in a post-trained model to their pre-trained weights, yielding a hybrid model with the same architecture and parameter count, incurring no additional inference cost. Across three different tasks (creative writing, open-ended question answering, and multi-step reasoning) and three different model families (Llama, Qwen, and Gemma), we find SLR can consistently and substantially improve output diversity while maintaining high output quality.
Paper Structure (34 sections, 10 equations, 7 figures, 12 tables)

This paper contains 34 sections, 10 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Pre-trained LLMs are diverse but have weak instruction-following ability, leading to low-quality responses. Post-trained LLMs show strong instruction adherence and high output quality but suffer from mode collapse. Selective layer restoration (SLR) restores the diverse modes existing in pre-trained LLMs while maintaining high output quality.
  • Figure 2: CRC trade-off landscape. Each point is a restoration interval $[i,j]$ (filtered to $Q_{\mathrm{CRC}}\!\ge\!0.9$) on the Pareto-front, plotted by quality $Q_{\mathrm{CRC}}$ (mean validity) and diversity $D_{\mathrm{CRC}}$ (mean entropy). The number next to each marker is the number of restored layers $\ell=j-i+1$. Pareto frontiers are smooth, and along fixed-start slices, restoring more layers increases diversity at a gradual cost in validity; post-trained models (diamonds $\diamondsuit$) sit at high-validity/low-diversity.
  • Figure 3: Main experiment results. We compare the post-trained model, Proxy-soup, and SLR across Llama, Qwen, and Gemma on creative writing, open-ended QA, and reasoning. (a) Creative writing results: top row shows quality (LLM-judge score) and bottom row shows diversity (embedding-based dissimilarity) (b) Open-ended QA results: quality is measured by precision, while diversity is measured by the entropy over correct answers and Coverage-$n$ (fraction of unique correct answers generated). (c) Reasoning results: Pass@$k$ as a function of the sampling budget $k$. Overall, SLR with the CRC-guided interval selection consistently improves diversity with minimal quality loss and yields higher Pass@$k$ across $k$ on all model families, providing affirmative answers to our research questions Q1 to Q3.
  • Figure 4: Creative writing results at $T=1.5$. Top row: judge-based quality scores (higher is better). Bottom row: semantic embedding-based diversity score (higher is better). We compare the post-trained model, Proxy-soup, and SLR across three models on joke, poem, and story generation. Overall, the performance gain of SLR persists under higher-temperature settings.
  • Figure 5: Open-ended QA results at $T=1.5$. Quality is measured by precision (left; higher is better). Diversity is measured by the entropy of the distribution over correct generated answers (middle; higher is better) and coverage-n, the fraction of unique correct answers generated at least once (right; higher is better). We compare the post-trained model, Proxy-Soup, and SLR across the three models. Overall, the performance gain of SLR persists under higher-temperature settings.
  • ...and 2 more figures