Table of Contents
Fetching ...

DocMEdit: Towards Document-Level Model Editing

Li Zeng, Zeming Liu, Chong Feng, Heyan Huang, Yuhang Guo

TL;DR

This work tackles the gap between traditional model editing benchmarks and real-world needs by introducing document-level model editing with DocMEdit, a large-scale dataset of 37,990 items where inputs and outputs are document-level and edits involve multiple facts. It combines document change data from Wikipedia, entity-based fact collection, and Wikidata-aligned knowledge graphs to enable both internal parameter edits and retrieval-based updates. Comprehensive experiments across diverse LLMs and baselines reveal that existing editing methods struggle with document-level edits, especially with longer contexts, longer facts, and multiple concurrent edits, and they exhibit substantial side effects. The study offers a suite of novel evaluation metrics and analysis that highlight key challenges and suggest potential strategies, underscoring the practical relevance and urgency of advancing document-level model editing for real-world deployment.

Abstract

Model editing aims to correct errors and outdated knowledge in the Large language models (LLMs) with minimal cost. Prior research has proposed a variety of datasets to assess the effectiveness of these model editing methods. However, most existing datasets only require models to output short phrases or sentences, overlooks the widespread existence of document-level tasks in the real world, raising doubts about their practical usability. Aimed at addressing this limitation and promoting the application of model editing in real-world scenarios, we propose the task of document-level model editing. To tackle such challenges and enhance model capabilities in practical settings, we introduce \benchmarkname, a dataset focused on document-level model editing, characterized by document-level inputs and outputs, extrapolative, and multiple facts within a single edit. We propose a series of evaluation metrics and experiments. The results show that the difficulties in document-level model editing pose challenges for existing model editing methods.

DocMEdit: Towards Document-Level Model Editing

TL;DR

This work tackles the gap between traditional model editing benchmarks and real-world needs by introducing document-level model editing with DocMEdit, a large-scale dataset of 37,990 items where inputs and outputs are document-level and edits involve multiple facts. It combines document change data from Wikipedia, entity-based fact collection, and Wikidata-aligned knowledge graphs to enable both internal parameter edits and retrieval-based updates. Comprehensive experiments across diverse LLMs and baselines reveal that existing editing methods struggle with document-level edits, especially with longer contexts, longer facts, and multiple concurrent edits, and they exhibit substantial side effects. The study offers a suite of novel evaluation metrics and analysis that highlight key challenges and suggest potential strategies, underscoring the practical relevance and urgency of advancing document-level model editing for real-world deployment.

Abstract

Model editing aims to correct errors and outdated knowledge in the Large language models (LLMs) with minimal cost. Prior research has proposed a variety of datasets to assess the effectiveness of these model editing methods. However, most existing datasets only require models to output short phrases or sentences, overlooks the widespread existence of document-level tasks in the real world, raising doubts about their practical usability. Aimed at addressing this limitation and promoting the application of model editing in real-world scenarios, we propose the task of document-level model editing. To tackle such challenges and enhance model capabilities in practical settings, we introduce \benchmarkname, a dataset focused on document-level model editing, characterized by document-level inputs and outputs, extrapolative, and multiple facts within a single edit. We propose a series of evaluation metrics and experiments. The results show that the difficulties in document-level model editing pose challenges for existing model editing methods.

Paper Structure

This paper contains 71 sections, 8 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: An example of DocMEdit. The input and output of DocMEdit are both document-level contents. Model editing should inject multiple facts to be edited into the model, enabling the edited model to output the updated document.
  • Figure 2: The construction process of DocMEdit. In the Document Change Computation, we calculate the updates of documents in the Wikipedia between two time points and retain those documents that exhibit entity updates. In the Facts Collection, based on the newly added entities within the documents, we identify the newly added sentences mentioning these entities and extract them as supporting facts. In the Knowledge Graph Extraction, we extract structured knowledge graphs and impose constraints based on Wikidata relations.
  • Figure 3: result of RQ1a. The x-axis represents the context length of the data, while the y-axis represents the corresponding DE.
  • Figure 4: result of RQ1b. The x-axis represents the length of the facts to be edited, while the y-axis represents the corresponding EE.
  • Figure 5: result of RQ2a. The x-axis represents the number of edits corresponding to each document, while the y-axis represents its DE.
  • ...and 5 more figures