Table of Contents
Fetching ...

Superficial Self-Improved Reasoners Benefit from Model Merging

Xiangchi Yuan, Chunhui Zhang, Zheyuan Liu, Dachuan Shi, Leyan Pan, Soroush Vosoughi, Wenke Lee

TL;DR

The paper identifies a risk in LLM self-improvement where gains on in-domain reasoning come at the cost of out-of-domain generalization due to memorization. It analyzes layer-wise contributions and finds a mismatch: reasoning-critical layers receive small updates while less important layers change more, driving superficial improvements. The authors propose Iterative Model Merging (IMM), optionally augmented with DARE masking, to fuse base and self-improved models across iterations, preserving generalization while enabling reasoning gains. Empirical results across multiple datasets and model scales show that IMM mitigates model collapse, maintains or improves $OOD$ performance, and extends to distillation scenarios, highlighting its practical potential for robust self-improving systems.

Abstract

As scaled language models (LMs) approach human-level reasoning capabilities, self-improvement emerges as a solution to synthesizing high-quality data corpus. While previous research has identified model collapse as a risk in self-improvement, where model outputs become increasingly deterministic, we discover a more fundamental challenge: the superficial self-improved reasoners phenomenon. In particular, our analysis reveals that even when LMs show improved in-domain (ID) reasoning accuracy, they actually compromise their generalized reasoning capabilities on out-of-domain (OOD) tasks due to memorization rather than genuine. Through a systematic investigation of LM architecture, we discover that during self-improvement, LM weight updates are concentrated in less reasoning-critical layers, leading to superficial learning. To address this, we propose Iterative Model Merging (IMM), a method that strategically combines weights from original and self-improved models to preserve generalization while incorporating genuine reasoning improvements. Our approach effectively mitigates both LM collapse and superficial learning, moving towards more stable self-improving systems.

Superficial Self-Improved Reasoners Benefit from Model Merging

TL;DR

The paper identifies a risk in LLM self-improvement where gains on in-domain reasoning come at the cost of out-of-domain generalization due to memorization. It analyzes layer-wise contributions and finds a mismatch: reasoning-critical layers receive small updates while less important layers change more, driving superficial improvements. The authors propose Iterative Model Merging (IMM), optionally augmented with DARE masking, to fuse base and self-improved models across iterations, preserving generalization while enabling reasoning gains. Empirical results across multiple datasets and model scales show that IMM mitigates model collapse, maintains or improves performance, and extends to distillation scenarios, highlighting its practical potential for robust self-improving systems.

Abstract

As scaled language models (LMs) approach human-level reasoning capabilities, self-improvement emerges as a solution to synthesizing high-quality data corpus. While previous research has identified model collapse as a risk in self-improvement, where model outputs become increasingly deterministic, we discover a more fundamental challenge: the superficial self-improved reasoners phenomenon. In particular, our analysis reveals that even when LMs show improved in-domain (ID) reasoning accuracy, they actually compromise their generalized reasoning capabilities on out-of-domain (OOD) tasks due to memorization rather than genuine. Through a systematic investigation of LM architecture, we discover that during self-improvement, LM weight updates are concentrated in less reasoning-critical layers, leading to superficial learning. To address this, we propose Iterative Model Merging (IMM), a method that strategically combines weights from original and self-improved models to preserve generalization while incorporating genuine reasoning improvements. Our approach effectively mitigates both LM collapse and superficial learning, moving towards more stable self-improving systems.

Paper Structure

This paper contains 52 sections, 8 equations, 15 figures, 11 tables.

Figures (15)

  • Figure 1: The Superficial Self-Improved Reasoners phenomenon is mitigated by iterative model merging. Our method improves ID and OOD reasoning performances.
  • Figure 2: Superficial Self-improved Reasoners. The model's performance is only improved on in-domain reasoning datasets while losing the generalized reasoning capabilities on out-of-domain reasoning datasets.
  • Figure 3: The Layer Importance Scores of strong reasoning model Qwen2.5-1.5B-Math on BookCorpus (left) and MATH datasets (right). The middle layers are less important while the early and late layers are more important for reasoning (MATH). For non-reasoning task (BookCorpus) middle layers are more important.
  • Figure 4: The weight change for SFT Qwen2.5-1.5B with self-improvement MATH data (left) and fully post-training Qwen2.5-1.5B to Qwen2.5-1.5B-Math using real data with 700B tokens (right).
  • Figure 5: The overall framework: (a) The model generates chain-of-thought (CoT) answers for the given questions, and incorrect answers are filtered out using the ground-truth. The remaining correct answers are used for SFT to self-improve the model. (b) IMM iteratively SFT the model and merges the self-improved models with the base model to balance reasoning enhancement and generalization.
  • ...and 10 more figures