Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
Jingcheng Deng, Zihao Wei, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng
TL;DR
This work addresses the gap in knowledge editing by targeting unstructured knowledge, which dominates real-world data. It introduces UnKE, a method that uses non-local block key-value storage across transformer blocks and cause-driven optimization that edits the last token without term localization, enabling robust edits in verbose, context-rich text. A new benchmark, UnKEBench, evaluates unstructured edits with metrics spanning lexical, semantic, factual, and general ability, and shows that UnKE outperforms strong baselines on both unstructured and structured editing tasks, with strong batch and sequential editing capabilities. The approach provides a practical path to timely, reliable knowledge updates in large language models while preserving context and preventing catastrophic forgetting.
Abstract
Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by long-form content, noise, and a complex yet comprehensive nature. Techniques like "local layer key-value storage" and "term-driven optimization", as used in previous methods like MEMIT, are not effective for handling unstructured knowledge. To address these challenges, we propose a novel Unstructured Knowledge Editing method, namely UnKE, which extends previous assumptions in the layer dimension and token dimension. Firstly, in the layer dimension, we propose non-local block key-value storage to replace local layer key-value storage, increasing the representation ability of key-value pairs and incorporating attention layer knowledge. Secondly, in the token dimension, we replace "term-driven optimization" with "cause-driven optimization", which edits the last token directly while preserving context, avoiding the need to locate terms and preventing the loss of context information. Results on newly proposed unstructured knowledge editing dataset (UnKEBench) and traditional structured datasets demonstrate that UnKE achieves remarkable performance, surpassing strong baselines. In addition, UnKE has robust batch editing and sequential editing capabilities.
