Table of Contents
Fetching ...

StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models

Baolong Bi, Shenghua Liu, Yiwei Wang, Lingrui Mei, Hongcheng Gao, Junfeng Fang, Xueqi Cheng

TL;DR

This work tackles the challenge of keeping large language models’ knowledge current by criticizing the Locate-Then-Edit paradigm and introducing StruEdit, a structural editing approach that operates on structured reasoning triplets rather than natural-language outputs. By removing potentially affected information and refilling through knowledge structures, StruEdit enables one-shot, multi-hop edits without updating model parameters, achieving higher editing accuracy and lower latency. Experimental results on MQuAKE across open- and closed-source models show StruEdit consistently outperforms state-of-the-art ME and ICE methods, with strong robustness as hop count and edited instances increase. The method’s use of entity matching and relation selection within external knowledge structures offers practical benefits for robust, scalable knowledge editing in real-world QA settings.

Abstract

As the modern tool of choice for question answering, large language models (LLMs) are expected to deliver answers with up-to-date knowledge. To achieve such ideal question-answering systems, locating and then editing outdated knowledge in the natural language outputs is a general target of popular knowledge editing methods. However, this target is challenging, as both identifying which tokens to edit in the reasoning steps and ensuring the coherence of the revised reasoning chain are difficult tasks. We argue that these challenges stem from the unstructured nature of natural language outputs. To address the above challenges, we propose $\textbf{Stru}$ctural $\textbf{Edit}$ing ($\textbf{StruEdit}$), an improved baseline for knowledge editing. We first prompt LLMs to produce structured outputs consisting of reasoning triplets. Then, StruEdit removes any potentially outdated knowledge and efficiently refills the structured outputs with up-to-date information in a single step. Experimental results show that StruEdit consistently delivers the highest accuracy with lowest latency compared with other knowledge editing methods.

StruEdit: Structured Outputs Enable the Fast and Accurate Knowledge Editing for Large Language Models

TL;DR

This work tackles the challenge of keeping large language models’ knowledge current by criticizing the Locate-Then-Edit paradigm and introducing StruEdit, a structural editing approach that operates on structured reasoning triplets rather than natural-language outputs. By removing potentially affected information and refilling through knowledge structures, StruEdit enables one-shot, multi-hop edits without updating model parameters, achieving higher editing accuracy and lower latency. Experimental results on MQuAKE across open- and closed-source models show StruEdit consistently outperforms state-of-the-art ME and ICE methods, with strong robustness as hop count and edited instances increase. The method’s use of entity matching and relation selection within external knowledge structures offers practical benefits for robust, scalable knowledge editing in real-world QA settings.

Abstract

As the modern tool of choice for question answering, large language models (LLMs) are expected to deliver answers with up-to-date knowledge. To achieve such ideal question-answering systems, locating and then editing outdated knowledge in the natural language outputs is a general target of popular knowledge editing methods. However, this target is challenging, as both identifying which tokens to edit in the reasoning steps and ensuring the coherence of the revised reasoning chain are difficult tasks. We argue that these challenges stem from the unstructured nature of natural language outputs. To address the above challenges, we propose ctural ing (), an improved baseline for knowledge editing. We first prompt LLMs to produce structured outputs consisting of reasoning triplets. Then, StruEdit removes any potentially outdated knowledge and efficiently refills the structured outputs with up-to-date information in a single step. Experimental results show that StruEdit consistently delivers the highest accuracy with lowest latency compared with other knowledge editing methods.
Paper Structure (28 sections, 6 figures, 4 tables)

This paper contains 28 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of performance between model editing (ME), in-context editing (ICE), and our $\textsc{StruEdit}$ on multi-hop editing tasks, showing editing accuracy and average inference speed. Our $\textsc{StruEdit}$ demonstrates the highest editing accuracy while maintaining the lowest latency.
  • Figure 2: Differences between ME, ICE methods, and our structural editing. ME and ICE first locate the position of edited facts within the natural language reasoning steps (ME identifies modification regions, while ICE retrieves relevant new knowledge) before editing. Both face challenges with incorrect localization and inconsistent reasoning due to the natural language output format. In contrast, structural editing removes LLMs' parametric knowledge and reasons over up-to-date knowledge structures using structured output logic to derive the final answer.
  • Figure 3: An illustration showing how $\textsc{StruEdit}$ answers multi-hop questions using new knowledge. For a multi-hop question, $\textsc{StruEdit}$ first guides LLMs to generate a reasoning chain using their parametric knowledge. It then extracts the source entity and sequential relations, matches the source entity within an external knowledge structure, and selects based on the sequential relations during reasoning to arrive at the final answer.
  • Figure 4: The query template has two components: prefix_question, a selective question, and candidate_description, describing the candidate set $C={c_1, c_2, ..., c_{|C|}}$, which represents either all entities or the relations associated with $e_{i-1}$. <feature> denotes the textual description of entities or relations.
  • Figure 5: Multi-hop QA results across 2, 3, and 4 hops on both open-source (LLaMA2-7B-Chat) and closed-source (GPT-3.5-Turbo-Instruct) models for ME, ICE methods, and our $\textsc{StruEdit}$.
  • ...and 1 more figures