Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
Haowen Pan, Xiaozhi Wang, Yixin Cao, Zenglin Shi, Xun Yang, Juanzi Li, Meng Wang
TL;DR
This paper tackles the poor editing locality of locate-then-edit methods that rely on causal tracing by introducing Fine-grained Neuron-level Knowledge Editing (FiNE). FiNE precisely identifies and updates targeted neurons within feed-forward networks, achieving superior locality and efficiency while maintaining editing success across multiple LLMs (GPT-J, LLaMA-2, LLaMA-3). The approach combines neuron-level contribution scoring with a structured update objective that includes editing, KL-divergence, and repetition penalties, and uses layer freezing to protect linguistic capabilities. Empirical results on the KnowEdit benchmark show FiNE outperforming existing locate-then-edit methods on locality and portability, with substantial reductions in the number of modified parameters and faster editing times. The work advances interpretability and reliability in knowledge editing for LLMs and suggests avenues for safer, more scalable updates to memory stores in large models.
Abstract
Knowledge editing aims to update outdated information in Large Language Models (LLMs). A representative line of study is locate-then-edit methods, which typically employ causal tracing to identify the modules responsible for recalling factual knowledge about entities. However, we find these methods are often sensitive only to changes in the subject entity, leaving them less effective at adapting to changes in relations. This limitation results in poor editing locality, which can lead to the persistence of irrelevant or inaccurate facts, ultimately compromising the reliability of LLMs. We believe this issue arises from the insufficient precision of knowledge localization. To address this, we propose a Fine-grained Neuron-level Knowledge Editing (FiNE) method that enhances editing locality without affecting overall success rates. By precisely identifying and modifying specific neurons within feed-forward networks, FiNE significantly improves knowledge localization and editing. Quantitative experiments demonstrate that FiNE efficiently achieves better overall performance compared to existing techniques, providing new insights into the localization and modification of knowledge within LLMs.
