Table of Contents
Fetching ...

When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations

Kailin Jiang, Yuntao Du, Yukai Ding, Yuchen Ren, Ning Jiang, Zhi Gao, Zilong Zheng, Lei Liu, Bin Li, Qing Li

TL;DR

A pipeline to construct MMEVOKE is proposed, and knowledge augmentation and knowledge retention methods are introduced, finding that knowledge-aware augmentation strengthens knowledge injection performance, and that Data Replay and MoE methods effectively mitigate capability degradation.

Abstract

Large Multimodal Models (LMMs) store vast amounts of pretrained knowledge but struggle to remain aligned with real-world updates, making it difficult to avoid capability degradation when acquiring evolving knowledge. Furthermore, most current work focuses on exploring static textual knowledge injection, neglecting dynamic multimodal evolving knowledge injection, leaving the potential of LMMs for multimodal knowledge injection as an open question. To address this, we first propose a pipeline to construct MMEVOKE, a benchmark for evaluating LMMs' ability in multimodal evolving knowledge injection. MMEVOKE contains 9,422 samples spanning 159 subtypes. Then, based on extensive experiments with MMEVOKE, we reveal challenges such as poor injection performance and capability degradation in existing knowledge injection methods through knowledge injection tests and general capability tests. Finally, to tackle these challenges, we introduce knowledge augmentation and knowledge retention methods, finding that knowledge-aware augmentation strengthens knowledge injection performance, and that Data Replay and MoE methods effectively mitigate capability degradation.

When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations

TL;DR

A pipeline to construct MMEVOKE is proposed, and knowledge augmentation and knowledge retention methods are introduced, finding that knowledge-aware augmentation strengthens knowledge injection performance, and that Data Replay and MoE methods effectively mitigate capability degradation.

Abstract

Large Multimodal Models (LMMs) store vast amounts of pretrained knowledge but struggle to remain aligned with real-world updates, making it difficult to avoid capability degradation when acquiring evolving knowledge. Furthermore, most current work focuses on exploring static textual knowledge injection, neglecting dynamic multimodal evolving knowledge injection, leaving the potential of LMMs for multimodal knowledge injection as an open question. To address this, we first propose a pipeline to construct MMEVOKE, a benchmark for evaluating LMMs' ability in multimodal evolving knowledge injection. MMEVOKE contains 9,422 samples spanning 159 subtypes. Then, based on extensive experiments with MMEVOKE, we reveal challenges such as poor injection performance and capability degradation in existing knowledge injection methods through knowledge injection tests and general capability tests. Finally, to tackle these challenges, we introduce knowledge augmentation and knowledge retention methods, finding that knowledge-aware augmentation strengthens knowledge injection performance, and that Data Replay and MoE methods effectively mitigate capability degradation.

Paper Structure

This paper contains 42 sections, 2 equations, 27 figures, 9 tables.

Figures (27)

  • Figure 1: Motivation and Overview of MMEvoke. A fundamental limitation of trained LMMs is their static nature, which causes their inherent knowledge to become outdated and inaccurate over time. Addressing this requires methods for the efficient acquisition of evolving knowledge. To facilitate research in this direction, we propose MMEvoke to specifically evaluate the knowledge injection performance of LMMs when confronted evolving knowledge.
  • Figure 2: Overview of construction pipeline for MMEvoke. For heuristic query, we manually write multiple templates and randomly select one template for each data.
  • Figure 3: Key Statistics of MMEvoke.
  • Figure 4: Area and Subfield Distribution of MMEvoke. “ Others” for Entity and News includes undisplayed subfields.
  • Figure 5: Performance of knowledge injection methods on MMEvoke.
  • ...and 22 more figures