DocTER: Evaluating Document-based Knowledge Editing

Suhang Wu; Ante Wang; Minlong Peng; Yujie Lin; Wenbo Li; Mingming Sun; Jinsong Su

DocTER: Evaluating Document-based Knowledge Editing

Suhang Wu, Ante Wang, Minlong Peng, Yujie Lin, Wenbo Li, Mingming Sun, Jinsong Su

TL;DR

This work introduces DocTER, the first benchmark for document-based knowledge editing in large language models, and demonstrates that editing with documents is significantly more challenging than editing with gold triples. It proposes an Extract-then-Edit pipeline to adapt existing triplet-based methods to document inputs and evaluates editing across four perspectives: Edit Success, Locality, Reasoning, and Cross-lingual Transfer. The study analyzes how extracted-triple quality, edit frequency, and target position impact performance, and shows external memory and reasoning-enhancement strategies can mitigate some challenges. The findings highlight practical considerations for real-world knowledge updates and point to future research directions in robust, multilingual document-based editing. The work thus advances the field by providing a realistic benchmark, a practical editing pipeline, and actionable insights for improving document-level knowledge editing.

Abstract

Knowledge editing aims to correct outdated or inaccurate knowledge in neural networks. In this paper, we explore knowledge editing using easily accessible documents instead of manually labeled factual triples employed in earlier research. To advance this field, we establish the first evaluation benchmark, \textit{DocTER}, featuring Documents containing counterfactual knowledge for editing. A comprehensive four-perspective evaluation is introduced: Edit Success, Locality, Reasoning, and Cross-lingual Transfer. To adapt conventional triplet-based knowledge editing methods for this task, we develop an Extract-then-Edit pipeline that extracts triples from documents before applying existing methods. Experiments on popular knowledge editing methods demonstrate that editing with documents presents significantly greater challenges than using triples. In document-based scenarios, even the best-performing in-context editing approach still lags behind by 10 points in editing success when compared to using gold triples. This observation also holds for both reasoning and cross-lingual test sets. We further analyze key factors influencing task performance, including the quality of extracted triples, the frequency and position of edited knowledge in documents, various methods for enhancing reasoning, and performance differences across various directions in cross-lingual knowledge editing, which provide valuable insights for future research.

DocTER: Evaluating Document-based Knowledge Editing

TL;DR

Abstract

Paper Structure (30 sections, 5 equations, 6 figures, 8 tables)

This paper contains 30 sections, 5 equations, 6 figures, 8 tables.

Introduction
Research Objectives
Related Works
Knowledge Editing Methods
Evaluation for Knowledge Editing
DocTER
Task Definition
Counterfactual Raw Documents Collection
The Topics of the Generated Documents
Four Perspective Evaluations
Edit Success
Locality
Reasoning
Cross-lingual Transfer
Pipeline: Extract-then-Edit
...and 15 more sections

Figures (6)

Figure 1: Scenario comparison between triplet-based knowledge editing and ours.
Figure 2: The overview of DocTER. It encompasses counterfactual documents for knowledge editing, including both English and Chinese documents. Our benchmark extends beyond conventional evaluation metrics like Edit Success and Locality, also assessing updated LLMs from the additional perspectives of Reasoning and Cross-lingual Transfer.
Figure 3: Left: Distribution of document topics for evaluating Edit Success, Reasoning, and Locality. Right: Distribution of document topics for assessing Cross-lingual Transfer.
Figure 4: In addition to document-level knowledge editing methods, our Extract-then-Edit Pipeline provides an alternative pathway for utilizing raw documents in knowledge editing through a two-stage process: triplet extraction followed by triplet-based knowledge editing.
Figure 5: Overview of our tool-based method pipeline. Strikethrough represents the triplet removed after this step.
...and 1 more figures

DocTER: Evaluating Document-based Knowledge Editing

TL;DR

Abstract

DocTER: Evaluating Document-based Knowledge Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)