VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark
Han Huang, Haitian Zhong, Tao Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan
TL;DR
VLKEB introduces a dedicated large vision-language model knowledge editing benchmark that leverages a multi-modal knowledge graph to ground edits in real images and entities. It extends the Portability metric and provides a comprehensive evaluation framework across five LVLMs with multiple editing methods, uncovering strengths and weaknesses in reliability, generality, locality, and cross-content transfer. The experiments reveal that in-context and memory-based approaches often excel in single-edit scenarios and portability, while parameter-update methods, including fine-tuning, struggle with long-horizon or multi-hop edits, highlighting the need for LVLM-specific editing strategies. The work offers practical insights and a valuable dataset to propel research on robust, transferable knowledge editing for multi-modal models, with clear directions for improving portability and handling sequential edits.
Abstract
Recently, knowledge editing on large language models (LLMs) has received considerable attention. Compared to this, editing Large Vision-Language Models (LVLMs) faces extra challenges from diverse data modalities and complicated model components, and data for LVLMs editing are limited. The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls short in the quality of synthesized evaluation images and cannot assess whether models apply edited knowledge in relevant content. Therefore, we employ more reliable data collection methods to construct a new Large $\textbf{V}$ision-$\textbf{L}$anguage Model $\textbf{K}$nowledge $\textbf{E}$diting $\textbf{B}$enchmark, $\textbf{VLKEB}$, and extend the Portability metric for more comprehensive evaluation. Leveraging a multi-modal knowledge graph, our image data are bound with knowledge entities. This can be further used to extract entity-related knowledge, which constitutes the base of editing data. We conduct experiments of different editing methods on five LVLMs, and thoroughly analyze how do they impact the models. The results reveal strengths and deficiencies of these methods and hopefully provide insights for future research. The codes and dataset are available at: https://github.com/VLKEB/VLKEB.
