Table of Contents
Fetching ...

Can Knowledge Editing Really Correct Hallucinations?

Baixiang Huang, Canyu Chen, Xiongxiao Xu, Ali Payani, Kai Shu

TL;DR

This work addresses whether knowledge editing can truly correct hallucinations in large language models by introducing HalluEditBench, a holistic benchmark built from a Wikidata-based hallucination dataset spanning 9 domains and 26 topics with over 6,000 verified hallucinations. It generates evaluation QA pairs across five facets—Efficacy, Generalization, Portability, Locality, and Robustness—and benches seven editing methods (FT-L, FT-M, MEMIT, ROME, LoRA, ICE, GRACE). Results reveal that while ICE and GRACE often yield the best Efficacy, gains do not consistently translate to Generalization, Portability, Locality, or Robustness, and performance varies markedly by domain and LLM. These findings suggest that existing knowledge-editing approaches have limited real-world effectiveness in hallucination correction and motivate more robust, facet-aware benchmarking and method development.

Abstract

Large Language Models (LLMs) suffer from hallucinations, referring to the non-factual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, a common issue of existing evaluation datasets for knowledge editing is that they do not ensure that LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in LLMs? We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations. First, we rigorously construct a massive hallucination dataset with 9 domains, 26 topics and more than 6,000 hallucinations. Then, we assess the performance of knowledge editing methods in a holistic way on five dimensions including Efficacy, Generalization, Portability, Locality, and Robustness. Through HalluEditBench, we have provided new insights into the potentials and limitations of different knowledge editing methods in correcting hallucinations, which could inspire future improvements and facilitate progress in the field of knowledge editing.

Can Knowledge Editing Really Correct Hallucinations?

TL;DR

This work addresses whether knowledge editing can truly correct hallucinations in large language models by introducing HalluEditBench, a holistic benchmark built from a Wikidata-based hallucination dataset spanning 9 domains and 26 topics with over 6,000 verified hallucinations. It generates evaluation QA pairs across five facets—Efficacy, Generalization, Portability, Locality, and Robustness—and benches seven editing methods (FT-L, FT-M, MEMIT, ROME, LoRA, ICE, GRACE). Results reveal that while ICE and GRACE often yield the best Efficacy, gains do not consistently translate to Generalization, Portability, Locality, or Robustness, and performance varies markedly by domain and LLM. These findings suggest that existing knowledge-editing approaches have limited real-world effectiveness in hallucination correction and motivate more robust, facet-aware benchmarking and method development.

Abstract

Large Language Models (LLMs) suffer from hallucinations, referring to the non-factual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, a common issue of existing evaluation datasets for knowledge editing is that they do not ensure that LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in LLMs? We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations. First, we rigorously construct a massive hallucination dataset with 9 domains, 26 topics and more than 6,000 hallucinations. Then, we assess the performance of knowledge editing methods in a holistic way on five dimensions including Efficacy, Generalization, Portability, Locality, and Robustness. Through HalluEditBench, we have provided new insights into the potentials and limitations of different knowledge editing methods in correcting hallucinations, which could inspire future improvements and facilitate progress in the field of knowledge editing.

Paper Structure

This paper contains 27 sections, 18 figures, 1 table.

Figures (18)

  • Figure 1: Framework of HalluEditBench. For real-world hallucinations, we holistically assess the performance of knowledge editing on Efficacy, Generalization, Portability, Locality, and Robustness.
  • Figure 2: Statistics of HalluEditBench Across Topics and Domains.
  • Figure 3: Efficacy Scores of Knowledge Editing Methods. The "overall" refers to the Efficacy Score (%) on the whole HalluEditBench embracing 9 domains for different methods. The Efficacy Score on each domain is also reported. Efficacy scores (%) are measured by the accuracy on Efficacy Evaluation Question-answer Pairs, where the pre-edit scores of each LLM are ensured $0\%$.
  • Figure 4: Generalization Scores of Knowledge Editing Methods. Generalization Scores (%) are measured by accuracy on five types of Generalization Evaluation Questions including Rephrased Questions ("rephrase"), Yes-or-No Questions with "Yes" or "No" as answers ("yes" or "no"), Multi-Choice Questions ("mc"), Reversed Questions ("reversed"). The "average" refers to averaged scores over five question types. The figure only shows the overall Generalization Scores for each type on the whole HalluEditBench. Generalization Scores for each domain are given in Appendix \ref{['Generalization Scores of Knowledge Editing Methods on All the Domains']}.
  • Figure 5: Portability Scores of Knowledge Editing Methods. Portability Scores (%) are measured by the accuracy on Portability Evaluation Questions, which are Efficacy Evaluation Questions with $N$ hops ($N = 1 \sim 6$). The Portability Evaluation Questions are the same as Efficacy Evaluation Questions when $N$ is 1. The Portability Scores on two domains "human" and "places" are reported in the figure. The results for more domains are given in Appendix \ref{['Portability Scores of Knowledge Editing Methods on All the Domains']}. The "overall" refers to the Portability Score (%) on the whole HalluEditBench embracing 9 domains.
  • ...and 13 more figures