NativE: Multi-modal Knowledge Graph Completion in the Wild

Yichi Zhang; Zhuo Chen; Lingbing Guo; Yajing Xu; Binbin Hu; Ziqi Liu; Wen Zhang; Huajun Chen

NativE: Multi-modal Knowledge Graph Completion in the Wild

Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen

TL;DR

NativE tackles the diversity and imbalance challenges of multi-modal knowledge graph completion in the wild by introducing two core components: Relatio n-guided Dual Adaptive Fusion (ReDAF), which enables adaptive, relation-aware fusion of arbitrary modalities, and Collaborative Modality Adversarial Training (CoMAT), which augments imbalanced modality information via Wasserstein GAN-based adversarial learning. The framework jointly learns multi-modal entity representations and leverages a RotatE-based score to assess triple plausibility, while a theoretical Lipschitz argument supports the adversarial design. A new WildKGC benchmark with five MMKGs demonstrates that NativE achieves state-of-the-art results across diverse datasets and modality configurations, and ablation studies confirm the importance of each module. Additional analyses show CoMAT’s generality across other MMKGC models and provide insights into efficiency and practical deployment, highlighting substantial improvements in real-world MMKGC tasks. Overall, NativE offers a scalable, generalizable approach to MMKGC in the wild, capable of leveraging broad modality spectra and coping with uneven modality distributions.

Abstract

Multi-modal knowledge graph completion (MMKGC) aims to automatically discover the unobserved factual knowledge from a given multi-modal knowledge graph by collaboratively modeling the triple structure and multi-modal information from entities. However, real-world MMKGs present challenges due to their diverse and imbalanced nature, which means that the modality information can span various types (e.g., image, text, numeric, audio, video) but its distribution among entities is uneven, leading to missing modalities for certain entities. Existing works usually focus on common modalities like image and text while neglecting the imbalanced distribution phenomenon of modal information. To address these issues, we propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities and employs a collaborative modality adversarial training framework to augment the imbalanced modality information. We construct a new benchmark called WildKGC with five datasets to evaluate our method. The empirical results compared with 21 recent baselines confirm the superiority of our method, consistently achieving state-of-the-art performance across different datasets and various scenarios while keeping efficient and generalizable. Our code and data are released at https://github.com/zjukg/NATIVE

NativE: Multi-modal Knowledge Graph Completion in the Wild

TL;DR

Abstract

Paper Structure (28 sections, 14 equations, 6 figures, 4 tables)

This paper contains 28 sections, 14 equations, 6 figures, 4 tables.

Introduction
Related Works
Knowledge Graph Completion
Multi-modal Knowledge Graph Completion
Generative Adversarial Networks
Task Definition
Methodology
Modality Encoding
Relation-guided Dual Adaptive Fusion
Collaborative Modality Adversarial Training
The Design of Generator
The Design of Discriminator
Overall Training Objective
Theoretical Analysis
Experiments and Evaluation
...and 13 more sections

Figures (6)

Figure 1: The diversity and imbalance nature in MMKGs. We report the modalities included in each MMKG in (a) and the statistical information about the modality information distribution across dataset/entity in TIVA in (b).
Figure 2: The overview of our NativE framework. NativE consists of two main modules called relation-guided dual adaptive fusion (ReDAF) module and collaborative modality adversarial training (CoMAT) module respectively. ReDAF is designed to fuse any input modality with modality adaptive weights and relational guidance. CoMAT aims to augment the imbalanced modality information in an adversarial manner by constructing synthetic triples to play a min-max game.
Figure 3: The imbalance MMKGC results. We report the MRR and Hit@10 results on the DB15K datasets. Further, we divide the test triples into three groups according to whether there was complete modal information and tally their experimental results separately, where: Group1 (both h and t are modality-complete); Group2 (one of h, r is modality-missing); Group3 (both h and t are modality-missing).
Figure 4: The generalization experiments of the CoMAT module on three different MMKGC models. We report the MRR and Hit@1 results on the DB15K dataset.
Figure 5: The results of the efficiency experiment. We report the MRR and Hit@1 results on the KVC16K/DB15K datasets.
...and 1 more figures

NativE: Multi-modal Knowledge Graph Completion in the Wild

TL;DR

Abstract

NativE: Multi-modal Knowledge Graph Completion in the Wild

Authors

TL;DR

Abstract

Table of Contents

Figures (6)