Time Sensitive Knowledge Editing through Efficient Finetuning
Xiou Ge, Ali Mousavi, Edouard Grave, Armand Joulin, Kun Qian, Benjamin Han, Mostafa Arefiyan, Yunyao Li
TL;DR
The paper tackles keeping time-sensitive factual knowledge in LLMs up to date, modeling edits as $(s, r, o) → (s, r, o')$ for modification and $(s, r, ∅) → (s, r, o')$ for injection. It argues that traditional locate-and-edit approaches face stability and scalability issues and introduces a fine-tuning objective $L_{FT} = \frac{1}{|D_M|} \sum_{d \in D_M} L(d; \Phi_0, \Delta \Phi)$ with frozen base weights $\Phi_0$ and learnable adapters $\Delta \Phi$. A large ChronoEdit dataset with ~15k time-sensitive edits is used to benchmark performance, and a layer-sweep reveals middle transformer layers are particularly influential for multi-hop QA. Empirical results show LoRA-based PEFT in particular, when applied to MLP (and optionally attention) layers, can match or exceed full fine-tuning performance with far fewer trainable parameters and better locality, demonstrating PEFT as a practical path for time-sensitive KE.
Abstract
Large Language Models (LLMs) have demonstrated impressive capability in different tasks and are bringing transformative changes to many domains. However, keeping the knowledge in LLMs up-to-date remains a challenge once pretraining is complete. It is thus essential to design effective methods to both update obsolete knowledge and induce new knowledge into LLMs. Existing locate-and-edit knowledge editing (KE) method suffers from two limitations. First, the post-edit LLMs by such methods generally have poor capability in answering complex queries that require multi-hop reasoning. Second, the long run-time of such locate-and-edit methods to perform knowledge edits make it infeasible for large scale KE in practice. In this paper, we explore Parameter-Efficient Fine-Tuning (PEFT) techniques as an alternative for KE. We curate a more comprehensive temporal KE dataset with both knowledge update and knowledge injection examples for KE performance benchmarking. We further probe the effect of fine-tuning on a range of layers in an LLM for the multi-hop QA task. We find that PEFT performs better than locate-and-edit techniques for time-sensitive knowledge edits.
