MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

Junzhe Zhang; Huixuan Zhang; Xunjian Yin; Baizhou Huang; Xu Zhang; Xinyu Hu; Xiaojun Wan

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, Xiaojun Wan

TL;DR

This work presents MC-MKE, a fine-grained Multimodal Knowledge Editing benchmark emphasizing Modality Consistency, and evaluates four multimodal knowledge editing methods on MC-MKE, revealing their limitations, particularly in terms of modality consistency.

Abstract

Multimodal large language models (MLLMs) are prone to non-factual or outdated knowledge issues, which can manifest as misreading and misrecognition errors due to the complexity of multimodal knowledge. Previous benchmarks have not systematically analyzed the performance of editing methods in correcting these two error types. To better represent and correct these errors, we decompose multimodal knowledge into its visual and textual components. Different error types correspond to different editing formats, which edit distinct parts of the multimodal knowledge. We present MC-MKE, a fine-grained Multimodal Knowledge Editing benchmark emphasizing Modality Consistency. Our benchmark facilitates independent correction of misreading and misrecognition errors by editing the corresponding knowledge component. We evaluate four multimodal knowledge editing methods on MC-MKE, revealing their limitations, particularly in terms of modality consistency. Our work highlights the challenges posed by multimodal knowledge editing and motivates further research in developing effective techniques for this task.

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

TL;DR

Abstract

Paper Structure (32 sections, 10 equations, 2 figures, 11 tables)

This paper contains 32 sections, 10 equations, 2 figures, 11 tables.

Introduction
Related Works
Knowledge Editing
Multimodal Models
Multimodal Knowledge Editing
Definition of Multimodal Knowledge
Definition of MMEdit
Requirements of MMEdit Method
MC-MKE Benchmark Construction
Data Selection
Dataset Construction
Editing Dataset Construction
Experiments
MMEdit Methods
Results & Analysis
...and 17 more sections

Figures (2)

Figure 1: An illustration of multimodal knowledge and the two types of multimodal errors: misrecognizing a picture of Mac Allister as Messi, and misreading Messi's football team.
Figure 2: The upper represents editing different components of MLLMs. The bottom provides an overview of different editing formats. With an input image and its corresponding textual knowledge $(s, r, o)$, we show three different editing formats. Although the final output is the same, the edited multimodal knowledge differs when editing its visual or textual knowledge, and the consistency property is also different given different edit inputs.

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

TL;DR

Abstract

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

Authors

TL;DR

Abstract

Table of Contents

Figures (2)