Noise-powered Multi-modal Knowledge Graph Representation Framework

Zhuo Chen; Yin Fang; Yichi Zhang; Lingbing Guo; Jiaoyan Chen; Jeff Z. Pan; Huajun Chen; Wen Zhang

Noise-powered Multi-modal Knowledge Graph Representation Framework

Zhuo Chen, Yin Fang, Yichi Zhang, Lingbing Guo, Jiaoyan Chen, Jeff Z. Pan, Huajun Chen, Wen Zhang

TL;DR

Noise-powered Multi-modal Knowledge Graph Representation Framework (SNAG) presents a Transformer-based encoder for unified MMKG representation learning, targeting MKGC and MMEA. It deliberately injects modality-level noise via Gauss Modality Noise Masking and uses an entity-level modality interaction module to robustly fuse multi-modal features, achieving state-of-the-art results across ten datasets with a compact 13M-parameter footprint. The framework supports both standalone encoding and enhancement of existing methods, validated on MKGC and MMEA benchmarks and demonstrated for open-ended downstream applications. This work advances robust multi-modal knowledge injection for large language models and MMKG pre-training, offering a practical, efficient pathway to reduce knowledge misconceptions and multi-modal hallucinations in downstream systems.

Abstract

The rise of Multi-modal Pre-training highlights the necessity for a unified Multi-Modal Knowledge Graph (MMKG) representation learning framework. Such a framework is essential for embedding structured knowledge into multi-modal Large Language Models effectively, alleviating issues like knowledge misconceptions and multi-modal hallucinations. In this work, we explore the efficacy of models in accurately embedding entities within MMKGs through two pivotal tasks: Multi-modal Knowledge Graph Completion (MKGC) and Multi-modal Entity Alignment (MMEA). Building on this foundation, we propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking to robustly integrate multi-modal entity features in KGs. By incorporating specific training objectives for both MKGC and MMEA, our approach achieves SOTA performance across a total of ten datasets, demonstrating its versatility. Moreover, SNAG can not only function as a standalone model but also enhance other existing methods, providing stable performance improvements. Code and data are available at https://github.com/zjukg/SNAG.

Noise-powered Multi-modal Knowledge Graph Representation Framework

TL;DR

Abstract

Noise-powered Multi-modal Knowledge Graph Representation Framework

Authors

TL;DR

Abstract

Table of Contents

Figures (2)