Related Knowledge Perturbation Matters: Rethinking Multiple Pieces of Knowledge Editing in Same-Subject
Zenghao Duan, Wenbin Duan, Zhiyi Yin, Yinghan Shen, Shaoling Jing, Jie Zhang, Huawei Shen, Xueqi Cheng
TL;DR
This work tackles updating entity-centric knowledge in LLMs by formalizing Same-Subject Editing and introducing the S^2RKE benchmark to study edits across multiple related facts for a single subject. It reveals a phenomenon called related knowledge perturbation in popular locate-then-edit methods, where subsequent edits interfere with earlier ones due to over-reliance on subject-derived keys in the MLP downsampling layer, evidenced by a high cosine similarity between keys and reduced Efficacy Success for the first edit. The findings are demonstrated across multiple models (e.g., GPT-2 XL, GPT-J, LLaMA-2-7B) and editing methods, highlighting a significant gap in current approaches and motivating the search for editing strategies that decouple edits from single-subject cues. The work underlines the practical impact of robust same-subject editing for coherent, multi-attribute knowledge updates in dynamic real-world information. All mathematical relations are presented with $...$ delimiters to ensure precise, machine-readable encoding of the concepts involved, such as $k_* = rac{1}{N} \,\sum_{i=1}^N \mathcal{K}(x_i \oplus p)$.
Abstract
Knowledge editing has become a promising approach for efficiently and precisely updating knowledge embedded in large language models (LLMs). In this work, we focus on Same-Subject Editing, which involves modifying multiple attributes of a single entity to ensure comprehensive and consistent updates to entity-centric knowledge. Through preliminary observation, we identify a significant challenge: Current state-of-the-art editing methods struggle when tasked with editing multiple related knowledge pieces for the same subject. To address the lack of relevant editing data for identical subjects in traditional benchmarks, we introduce the $\text{S}^2\text{RKE}$(Same-Subject Related Knowledge Editing) benchmark. Our extensive experiments reveal that only mainstream locate-then-edit methods, such as ROME and MEMIT, exhibit "related knowledge perturbation," where subsequent edits interfere with earlier ones. Further analysis reveals that these methods over-rely on subject information, neglecting other critical factors, resulting in reduced editing effectiveness.
