Table of Contents
Fetching ...

FAME: Towards Factual Multi-Task Model Editing

Li Zeng, Yingyu Shan, Zeming Liu, Jiashu Yao, Yuhang Guo

TL;DR

FAME is presented, an authentic, comprehensive, and multi-task dataset, which is designed to enhance the practicality of model editing and SKEME, a model editing method that uses a novel caching mechanism to ensure synchronization with the real world.

Abstract

Large language models (LLMs) embed extensive knowledge and utilize it to perform exceptionally well across various tasks. Nevertheless, outdated knowledge or factual errors within LLMs can lead to misleading or incorrect responses, causing significant issues in practical applications. To rectify the fatal flaw without the necessity for costly model retraining, various model editing approaches have been proposed to correct inaccurate knowledge within LLMs in a cost-efficient way. To evaluate these model editing methods, previous work introduced a series of datasets. However, most of the previous datasets only contain fabricated data in a single format, which diverges from real-world model editing scenarios, raising doubts about their usability in practice. To facilitate the application of model editing in real-world scenarios, we propose the challenge of practicality. To resolve such challenges and effectively enhance the capabilities of LLMs, we present FAME, an factual, comprehensive, and multi-task dataset, which is designed to enhance the practicality of model editing. We then propose SKEME, a model editing method that uses a novel caching mechanism to ensure synchronization with the real world. The experiments demonstrate that SKEME performs excellently across various tasks and scenarios, confirming its practicality.

FAME: Towards Factual Multi-Task Model Editing

TL;DR

FAME is presented, an authentic, comprehensive, and multi-task dataset, which is designed to enhance the practicality of model editing and SKEME, a model editing method that uses a novel caching mechanism to ensure synchronization with the real world.

Abstract

Large language models (LLMs) embed extensive knowledge and utilize it to perform exceptionally well across various tasks. Nevertheless, outdated knowledge or factual errors within LLMs can lead to misleading or incorrect responses, causing significant issues in practical applications. To rectify the fatal flaw without the necessity for costly model retraining, various model editing approaches have been proposed to correct inaccurate knowledge within LLMs in a cost-efficient way. To evaluate these model editing methods, previous work introduced a series of datasets. However, most of the previous datasets only contain fabricated data in a single format, which diverges from real-world model editing scenarios, raising doubts about their usability in practice. To facilitate the application of model editing in real-world scenarios, we propose the challenge of practicality. To resolve such challenges and effectively enhance the capabilities of LLMs, we present FAME, an factual, comprehensive, and multi-task dataset, which is designed to enhance the practicality of model editing. We then propose SKEME, a model editing method that uses a novel caching mechanism to ensure synchronization with the real world. The experiments demonstrate that SKEME performs excellently across various tasks and scenarios, confirming its practicality.

Paper Structure

This paper contains 54 sections, 8 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: An example of FAME. LLMs may develop factual inaccuracies over time, which can be corrected through model editing. While previous datasets employed fabricated data, FAME utilizes real-world data to improve the performance of LLMs in practical usage.
  • Figure 2: An overview of SKEME. SKEME initially extracts key entities from the question. Subsequently, it retrieves the knowledge base for facts related to entities. Then ranks applicable knowledge items and utilizes in-context learning to modify the model's output. Additionally, we update knowledge from external databases and the real world to ensure that the local knowledge base reflects real-world changes.
  • Figure 3: Result of RQ1. The x-axis indicates the number of edits to the same fact.
  • Figure 4: Result of RQ3. The x-axis represents the number of edited facts.
  • Figure 5: Sparql code used to query equivalent relations.
  • ...and 8 more figures