Continual Multimodal Knowledge Graph Construction
Xiang Chen, Jintian Zhang, Xiaohan Wang, Ningyu Zhang, Tongtong Wu, Yuxiang Wang, Yongheng Wang, Huajun Chen
TL;DR
This work tackles continual multimodal knowledge graph construction (MKGC), addressing the challenge of catastrophic forgetting as new entities and relations continually emerge. It introduces MSPT, a dual-stream Transformer framework that combines gradient modulation for balanced learning with hand-in-hand multimodal interaction and attention distillation to preserve past knowledge. The authors also establish incremental MKGC benchmarks (IMNER and IMRE) and show MSPT outperforms both multimodal MKGC baselines and traditional continual learning methods, with strong plasticity and robust stability. The study demonstrates that careful management of inter-modal learning dynamics and attention patterns yields superior performance in evolving multimodal knowledge environments, with practical implications for real-world streaming data scenarios.
Abstract
Current Multimodal Knowledge Graph Construction (MKGC) models struggle with the real-world dynamism of continuously emerging entities and relations, often succumbing to catastrophic forgetting-loss of previously acquired knowledge. This study introduces benchmarks aimed at fostering the development of the continual MKGC domain. We further introduce MSPT framework, designed to surmount the shortcomings of existing MKGC approaches during multimedia data processing. MSPT harmonizes the retention of learned knowledge (stability) and the integration of new data (plasticity), outperforming current continual learning and multimodal methods. Our results confirm MSPT's superior performance in evolving knowledge environments, showcasing its capacity to navigate balance between stability and plasticity.
