Retention analysis of edited knowledge after fine-tuning
Fufang Wen, Shichang Zhang
TL;DR
The paper addresses how knowledge edits introduced by editing methods survive subsequent fine-tuning in LLMs. It systematically evaluates multiple KE methods across four downstream tasks and demonstrates that edited knowledge is typically more prone to forgetting than intrinsic knowledge, aligning with an elasticity-theory viewpoint that ties retention to pretraining data volume. The authors introduce two practical remedies—paraphrase-augmented edits and selective layer freezing during fine-tuning—that restore or even exceed intrinsic retention under many settings, and they provide a formal equation linking retention dynamics to data volumes: $\frac{d\gamma_{p_{\theta}}^{\mathcal{D}_{2}/\mathcal{D}}}{d l} = \Theta\left(-k\frac{d\gamma_{p_{\theta}}^{\mathcal{D}_{1}/\mathcal{D}}}{d l}\right)$ with $l = \frac{|\mathcal{D}_3|}{|\mathcal{D}_2|} \ll 1$ and $k = \frac{|\mathcal{D}_1|}{|\mathcal{D}_2|} \gg 1$. The findings show that paraphrase-rich edits and targeted freezing enable more robust retention, guiding practical deployment of KE methods in downstream pipelines.
Abstract
Large language models (LLMs) store vast amounts of knowledge, which often requires updates to correct factual errors, incorporate newly acquired information, or adapt model behavior. Model editing methods have emerged as efficient solutions for such updates, offering localized and precise knowledge modification at significantly lower computational cost than continual training. In parallel, LLMs are frequently fine-tuned for a wide range of downstream tasks. However, the effect of fine-tuning on previously edited knowledge remains poorly understood. In this work, we systematically investigate how different fine-tuning objectives interact with various model editing techniques. Our findings show that edited knowledge is substantially more susceptible to forgetting during fine-tuning than intrinsic knowledge acquired through pre-training. This analysis highlights a key limitation of current editing approaches and suggests that evaluating edit robustness under downstream fine-tuning is critical for their practical deployment. We further find that knowledge retention can be significantly improved by either augmenting edit knowledge with paraphrases or by freezing layers associated with edited content in fine-tuning stage, offering insight for developing more robust editing algorithms.
