Table of Contents
Fetching ...

LoKI: Low-damage Knowledge Implanting of Large Language Models

Runyu Wang, Peng Ping, Zhengyu Guo, Xiaoye Zhang, Quan Shi, Liting Zhou, Tianbo Ji

TL;DR

This work tackles catastrophic forgetting during fine-tuning of large language models by introducing LoKI, a parameter-efficient framework that preserves pretrained knowledge while enabling task-specific adaptation. LoKI combines Knowledge Vector Attribution (KVA), which uses Integrated Gradients to quantify the contribution of individual FFN knowledge vectors, with a Layer-Balanced Strategy that allocates equal trainable capacity per transformer layer, and an implanting step that updates only the selected vectors (optionally via LoRA). The authors demonstrate through ToolACE and LB Reranker experiments that LoKI achieves superior retention of general capabilities compared to full fine-tuning and other PEFT methods, while maintaining or surpassing task-specific performance. The findings reveal that both high- and low-contribution vectors cluster in similar layers, suggesting a structured knowledge hierarchy, and show that layer-aware updates are crucial to mitigating forgetting. Overall, LoKI presents a practical, interpretable approach to sustainable LLM customization by bridging mechanistic interpretability with fine-tuning objectives, with potential for strong synergy with existing tuning techniques like LoRA.

Abstract

Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pretraining is overwritten. To address the issue of CF in a general-purpose framework, we propose Low-damage Knowledge Implanting (LoKI), a parameter-efficient fine-tuning (PEFT) technique that utilizes recent mechanistic understanding of how knowledge is stored in transformer architectures. We compare LoKI against state-of-the-art PEFT methods in two real-world fine-tuning scenarios. The results show that LoKI demonstrates significantly better preservation of general capabilities. At the same time, its task-specific performance is comparable to or even surpasses that of full parameter fine-tuning and these PEFT methods across various model architectures. Our work bridges the mechanistic insights of LLMs' knowledge storage with practical fine-tuning objectives, enabling an effective balance between task-specific adaptation and the retention of general-purpose capabilities.

LoKI: Low-damage Knowledge Implanting of Large Language Models

TL;DR

This work tackles catastrophic forgetting during fine-tuning of large language models by introducing LoKI, a parameter-efficient framework that preserves pretrained knowledge while enabling task-specific adaptation. LoKI combines Knowledge Vector Attribution (KVA), which uses Integrated Gradients to quantify the contribution of individual FFN knowledge vectors, with a Layer-Balanced Strategy that allocates equal trainable capacity per transformer layer, and an implanting step that updates only the selected vectors (optionally via LoRA). The authors demonstrate through ToolACE and LB Reranker experiments that LoKI achieves superior retention of general capabilities compared to full fine-tuning and other PEFT methods, while maintaining or surpassing task-specific performance. The findings reveal that both high- and low-contribution vectors cluster in similar layers, suggesting a structured knowledge hierarchy, and show that layer-aware updates are crucial to mitigating forgetting. Overall, LoKI presents a practical, interpretable approach to sustainable LLM customization by bridging mechanistic interpretability with fine-tuning objectives, with potential for strong synergy with existing tuning techniques like LoRA.

Abstract

Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pretraining is overwritten. To address the issue of CF in a general-purpose framework, we propose Low-damage Knowledge Implanting (LoKI), a parameter-efficient fine-tuning (PEFT) technique that utilizes recent mechanistic understanding of how knowledge is stored in transformer architectures. We compare LoKI against state-of-the-art PEFT methods in two real-world fine-tuning scenarios. The results show that LoKI demonstrates significantly better preservation of general capabilities. At the same time, its task-specific performance is comparable to or even surpasses that of full parameter fine-tuning and these PEFT methods across various model architectures. Our work bridges the mechanistic insights of LLMs' knowledge storage with practical fine-tuning objectives, enabling an effective balance between task-specific adaptation and the retention of general-purpose capabilities.

Paper Structure

This paper contains 35 sections, 10 equations, 5 figures, 16 tables.

Figures (5)

  • Figure 1: Schematic illustration of the staged fine-tuning process in LoKI.
  • Figure 2: Heatmaps of the top $5\%$ KVA results across all 32 layers of Llama3.1-8B-Instruct. The vertical axis denotes node indices, and the horizontal axis denotes layer indices. The upper (red-tinted) heatmap illustrates the distribution of high-contribution node positions, while the lower (blue-tinted) heatmap illustrates the distribution of low-contribution node positions. Color intensity (log-scale) reflects the density of nodes within each category, with darker colors indicating higher density. Heatmaps for additional models are provided in Appendix B.
  • Figure 3: Heatmaps of top $10\%$ KVA results across all 24 layers of Qwen2.5-0.5B-Instruct. The upper (red-tinted) map highlights the distribution of high-contribution node positions, while the lower (blue-tinted) map highlights the distribution of low-contribution node positions. Color intensity (log-scale) indicates the density of neurons in each category, with darker colors representing higher density.
  • Figure 4: Heatmaps of top $5\%$ KVA results across all 32 layers of Llama2-7B, following the same visualization settings as in Fig. \ref{['fig:kva_qwen']}.
  • Figure 5: Heatmaps of top $5\%$ KVA results across all 32 layers of Llama3.1-8B-Instruct, following the same visualization settings as in Fig. \ref{['fig:kva_qwen']}.