Beyond Learning: A Training-Free Alternative to Model Adaptation

Namkyung Yoon; Kyeonghyun Yoo; Wooyong Jung; Sanghong Kim; Hwangnam Kim

Beyond Learning: A Training-Free Alternative to Model Adaptation

Namkyung Yoon, Kyeonghyun Yoo, Wooyong Jung, Sanghong Kim, Hwangnam Kim

TL;DR

Proposes a training-free model transplantation framework that identifies internal modules with task-specific activation changes via layer-wise activation evaluation and directly swaps them between structurally compatible models, without retraining. Activation-based selection and compatibility criteria yield $TIR$ and $Recovery$ metrics to quantify gains, demonstrated across cross-generation and base-vs-tuned settings, showing substantial improvements when localized modules are transplanted. The results support a modular view of LM computations, revealing that task-relevant behavior can be transferred by swapping a small subset of linear modules (FFN projections and attention projections) with minimal disruption to the rest of the model. This work introduces model transplantation as a new research direction for rapid capability transfer and controlled rollback in evolving language models.

Abstract

Despite the continuous research and evolution of language models, they sometimes underperform previous versions. Existing approaches to overcome these challenges are resource-intensive, highlighting the need for alternatives that enable immediate action. We assume that each language model has a local module inside that is suitable for a specific function. First, this work identifies a set of modules showing consistent and local activation changes under an inference workload through activation-based analysis. Subsequently, we transplant an internal module that is properly activated for a specific task into the target model, leading to immediate and measurable functional changes without additional training or fine-tuning. To experimentally demonstrate the effectiveness of the transplant technique, we quantify the relationship between transplant strength and performance improvement under different conditions for two language models. In the cross-generation setting, we find that transplanting activation-selected modules can substantially improve the underperforming model, reaching up to twice the target baseline and achieving gap-based recovery above 100%. Moreover, in transplant experiments between a base model and its instruction-tuned counterpart, transplantation improves the underperforming model toward the stronger baseline, yielding up to about 2.33 times the target baseline with gap-based recovery reaching up to 100% in the best case. These results show that meaningful capacity transfer can be realized through the implantation of highly localized modules implied by language models. Overall, this work provides empirical evidence for task-localized modularity in language models and presents a new research area: model transplantation.

Beyond Learning: A Training-Free Alternative to Model Adaptation

TL;DR

and

metrics to quantify gains, demonstrated across cross-generation and base-vs-tuned settings, showing substantial improvements when localized modules are transplanted. The results support a modular view of LM computations, revealing that task-relevant behavior can be transferred by swapping a small subset of linear modules (FFN projections and attention projections) with minimal disruption to the rest of the model. This work introduces model transplantation as a new research direction for rapid capability transfer and controlled rollback in evolving language models.

Abstract

Paper Structure (15 sections, 9 equations, 3 figures, 5 tables)

This paper contains 15 sections, 9 equations, 3 figures, 5 tables.

Introduction
Preliminary
Optimization-Based Model Adaptation
Motivation
Model Transplant
Pre-Transplant Diagnosis
Selection Criterion for Transplanting
Model Transplantation Procedure
Experiment
Implementation and Dataset Details
Results and Analysis
Results between Model Generations
Results between Base and Tuned Models
Discussion
Conclusion

Figures (3)

Figure 1: Illustration of module-level transplantability in language models.
Figure 2: The TIR and Recovery of Phi-3 $\leftrightarrow$ Phi-3.5 as a function of the number of transplanted modules K for different decoding lengths with max_new_tokens.
Figure 3: The TIR and Recovery of Gemma-2-2B-IT $\leftrightarrow$ Gemma-2-2B as a function of the number of transplanted modules $K$ for different decoding lengths with max_new_tokens.

Beyond Learning: A Training-Free Alternative to Model Adaptation

TL;DR

Abstract

Beyond Learning: A Training-Free Alternative to Model Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)