LoKI: Low-damage Knowledge Implanting of Large Language Models
Runyu Wang, Peng Ping, Zhengyu Guo, Xiaoye Zhang, Quan Shi, Liting Zhou, Tianbo Ji
TL;DR
This work tackles catastrophic forgetting during fine-tuning of large language models by introducing LoKI, a parameter-efficient framework that preserves pretrained knowledge while enabling task-specific adaptation. LoKI combines Knowledge Vector Attribution (KVA), which uses Integrated Gradients to quantify the contribution of individual FFN knowledge vectors, with a Layer-Balanced Strategy that allocates equal trainable capacity per transformer layer, and an implanting step that updates only the selected vectors (optionally via LoRA). The authors demonstrate through ToolACE and LB Reranker experiments that LoKI achieves superior retention of general capabilities compared to full fine-tuning and other PEFT methods, while maintaining or surpassing task-specific performance. The findings reveal that both high- and low-contribution vectors cluster in similar layers, suggesting a structured knowledge hierarchy, and show that layer-aware updates are crucial to mitigating forgetting. Overall, LoKI presents a practical, interpretable approach to sustainable LLM customization by bridging mechanistic interpretability with fine-tuning objectives, with potential for strong synergy with existing tuning techniques like LoRA.
Abstract
Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pretraining is overwritten. To address the issue of CF in a general-purpose framework, we propose Low-damage Knowledge Implanting (LoKI), a parameter-efficient fine-tuning (PEFT) technique that utilizes recent mechanistic understanding of how knowledge is stored in transformer architectures. We compare LoKI against state-of-the-art PEFT methods in two real-world fine-tuning scenarios. The results show that LoKI demonstrates significantly better preservation of general capabilities. At the same time, its task-specific performance is comparable to or even surpasses that of full parameter fine-tuning and these PEFT methods across various model architectures. Our work bridges the mechanistic insights of LLMs' knowledge storage with practical fine-tuning objectives, enabling an effective balance between task-specific adaptation and the retention of general-purpose capabilities.
