Table of Contents
Fetching ...

Continual Multimodal Knowledge Graph Construction

Xiang Chen, Jintian Zhang, Xiaohan Wang, Ningyu Zhang, Tongtong Wu, Yuxiang Wang, Yongheng Wang, Huajun Chen

TL;DR

This work tackles continual multimodal knowledge graph construction (MKGC), addressing the challenge of catastrophic forgetting as new entities and relations continually emerge. It introduces MSPT, a dual-stream Transformer framework that combines gradient modulation for balanced learning with hand-in-hand multimodal interaction and attention distillation to preserve past knowledge. The authors also establish incremental MKGC benchmarks (IMNER and IMRE) and show MSPT outperforms both multimodal MKGC baselines and traditional continual learning methods, with strong plasticity and robust stability. The study demonstrates that careful management of inter-modal learning dynamics and attention patterns yields superior performance in evolving multimodal knowledge environments, with practical implications for real-world streaming data scenarios.

Abstract

Current Multimodal Knowledge Graph Construction (MKGC) models struggle with the real-world dynamism of continuously emerging entities and relations, often succumbing to catastrophic forgetting-loss of previously acquired knowledge. This study introduces benchmarks aimed at fostering the development of the continual MKGC domain. We further introduce MSPT framework, designed to surmount the shortcomings of existing MKGC approaches during multimedia data processing. MSPT harmonizes the retention of learned knowledge (stability) and the integration of new data (plasticity), outperforming current continual learning and multimodal methods. Our results confirm MSPT's superior performance in evolving knowledge environments, showcasing its capacity to navigate balance between stability and plasticity.

Continual Multimodal Knowledge Graph Construction

TL;DR

This work tackles continual multimodal knowledge graph construction (MKGC), addressing the challenge of catastrophic forgetting as new entities and relations continually emerge. It introduces MSPT, a dual-stream Transformer framework that combines gradient modulation for balanced learning with hand-in-hand multimodal interaction and attention distillation to preserve past knowledge. The authors also establish incremental MKGC benchmarks (IMNER and IMRE) and show MSPT outperforms both multimodal MKGC baselines and traditional continual learning methods, with strong plasticity and robust stability. The study demonstrates that careful management of inter-modal learning dynamics and attention patterns yields superior performance in evolving multimodal knowledge environments, with practical implications for real-world streaming data scenarios.

Abstract

Current Multimodal Knowledge Graph Construction (MKGC) models struggle with the real-world dynamism of continuously emerging entities and relations, often succumbing to catastrophic forgetting-loss of previously acquired knowledge. This study introduces benchmarks aimed at fostering the development of the continual MKGC domain. We further introduce MSPT framework, designed to surmount the shortcomings of existing MKGC approaches during multimedia data processing. MSPT harmonizes the retention of learned knowledge (stability) and the integration of new data (plasticity), outperforming current continual learning and multimodal methods. Our results confirm MSPT's superior performance in evolving knowledge environments, showcasing its capacity to navigate balance between stability and plasticity.
Paper Structure (30 sections, 15 equations, 6 figures, 3 tables)

This paper contains 30 sections, 15 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Results on incremental MRE (IMRE) benchmark. We benchmark MSPT against the Vanilla Training approach, multimodal KGC models such as MEGA and MKGformer, as well as the continual RE method RP-CRE.
  • Figure 2: Overview of our MSPT framework.
  • Figure 3: Performance in plasticity on the IMRE Benchmark.
  • Figure 4: Change of contribution ratio $\gamma^{t}_{n}$ during training.
  • Figure 5: Analysis on rehearsal size.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2