Stable Knowledge Editing in Large Language Models
Zihao Wei, Liang Pang, Hanxing Ding, Jingcheng Deng, Huawei Shen, Xueqi Cheng
TL;DR
This paper tackles the instability of knowledge editing in large language models by challenging the localization assumption and introducing StableKE, a data-centric approach that uses semantic paraphrase enhancement (SPE) and contextual description enrichment (CDE) to augment knowledge descriptions and surrounding context. It pairs StableKE with KEBench, a tree-structured, multi-hop knowledge editing benchmark that evaluates four stability dimensions: edited knowledge, multi-hop reasoning, unrelated knowledge, and general capabilities. Empirical results show StableKE outperforms prior methods across large-scale batch and sequential edits, maintains unrelated knowledge better, and preserves multi-hop reasoning and instruction-following abilities, including on ChatGPT. The work also demonstrates that data quality and diverse augmentation are crucial for robust editing, while noting limitations in sequential edits and some multi-hop gaps that point to future improvements and broader applicability including non-public models via finetuning APIs.
Abstract
Efficient knowledge editing of large language models is crucial for replacing obsolete information or incorporating specialized knowledge on a large scale. However, previous methods implicitly assume that knowledge is localized and isolated within the model, an assumption that oversimplifies the interconnected nature of model knowledge. The premise of localization results in an incomplete knowledge editing, whereas an isolated assumption may impair both other knowledge and general abilities. It introduces instability to the performance of the knowledge editing method. To transcend these assumptions, we introduce StableKE, a method adopts a novel perspective based on knowledge augmentation rather than knowledge localization. To overcome the expense of human labeling, StableKE integrates two automated knowledge augmentation strategies: Semantic Paraphrase Enhancement strategy, which diversifies knowledge descriptions to facilitate the teaching of new information to the model, and Contextual Description Enrichment strategy, expanding the surrounding knowledge to prevent the forgetting of related information. StableKE surpasses other knowledge editing methods, demonstrating stability both edited knowledge and multi-hop knowledge, while also preserving unrelated knowledge and general abilities. Moreover, StableKE can edit knowledge on ChatGPT.
