Table of Contents
Fetching ...

NativE: Multi-modal Knowledge Graph Completion in the Wild

Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen

TL;DR

NativE tackles the diversity and imbalance challenges of multi-modal knowledge graph completion in the wild by introducing two core components: Relatio n-guided Dual Adaptive Fusion (ReDAF), which enables adaptive, relation-aware fusion of arbitrary modalities, and Collaborative Modality Adversarial Training (CoMAT), which augments imbalanced modality information via Wasserstein GAN-based adversarial learning. The framework jointly learns multi-modal entity representations and leverages a RotatE-based score to assess triple plausibility, while a theoretical Lipschitz argument supports the adversarial design. A new WildKGC benchmark with five MMKGs demonstrates that NativE achieves state-of-the-art results across diverse datasets and modality configurations, and ablation studies confirm the importance of each module. Additional analyses show CoMAT’s generality across other MMKGC models and provide insights into efficiency and practical deployment, highlighting substantial improvements in real-world MMKGC tasks. Overall, NativE offers a scalable, generalizable approach to MMKGC in the wild, capable of leveraging broad modality spectra and coping with uneven modality distributions.

Abstract

Multi-modal knowledge graph completion (MMKGC) aims to automatically discover the unobserved factual knowledge from a given multi-modal knowledge graph by collaboratively modeling the triple structure and multi-modal information from entities. However, real-world MMKGs present challenges due to their diverse and imbalanced nature, which means that the modality information can span various types (e.g., image, text, numeric, audio, video) but its distribution among entities is uneven, leading to missing modalities for certain entities. Existing works usually focus on common modalities like image and text while neglecting the imbalanced distribution phenomenon of modal information. To address these issues, we propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities and employs a collaborative modality adversarial training framework to augment the imbalanced modality information. We construct a new benchmark called WildKGC with five datasets to evaluate our method. The empirical results compared with 21 recent baselines confirm the superiority of our method, consistently achieving state-of-the-art performance across different datasets and various scenarios while keeping efficient and generalizable. Our code and data are released at https://github.com/zjukg/NATIVE

NativE: Multi-modal Knowledge Graph Completion in the Wild

TL;DR

NativE tackles the diversity and imbalance challenges of multi-modal knowledge graph completion in the wild by introducing two core components: Relatio n-guided Dual Adaptive Fusion (ReDAF), which enables adaptive, relation-aware fusion of arbitrary modalities, and Collaborative Modality Adversarial Training (CoMAT), which augments imbalanced modality information via Wasserstein GAN-based adversarial learning. The framework jointly learns multi-modal entity representations and leverages a RotatE-based score to assess triple plausibility, while a theoretical Lipschitz argument supports the adversarial design. A new WildKGC benchmark with five MMKGs demonstrates that NativE achieves state-of-the-art results across diverse datasets and modality configurations, and ablation studies confirm the importance of each module. Additional analyses show CoMAT’s generality across other MMKGC models and provide insights into efficiency and practical deployment, highlighting substantial improvements in real-world MMKGC tasks. Overall, NativE offers a scalable, generalizable approach to MMKGC in the wild, capable of leveraging broad modality spectra and coping with uneven modality distributions.

Abstract

Multi-modal knowledge graph completion (MMKGC) aims to automatically discover the unobserved factual knowledge from a given multi-modal knowledge graph by collaboratively modeling the triple structure and multi-modal information from entities. However, real-world MMKGs present challenges due to their diverse and imbalanced nature, which means that the modality information can span various types (e.g., image, text, numeric, audio, video) but its distribution among entities is uneven, leading to missing modalities for certain entities. Existing works usually focus on common modalities like image and text while neglecting the imbalanced distribution phenomenon of modal information. To address these issues, we propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities and employs a collaborative modality adversarial training framework to augment the imbalanced modality information. We construct a new benchmark called WildKGC with five datasets to evaluate our method. The empirical results compared with 21 recent baselines confirm the superiority of our method, consistently achieving state-of-the-art performance across different datasets and various scenarios while keeping efficient and generalizable. Our code and data are released at https://github.com/zjukg/NATIVE
Paper Structure (28 sections, 14 equations, 6 figures, 4 tables)

This paper contains 28 sections, 14 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The diversity and imbalance nature in MMKGs. We report the modalities included in each MMKG in (a) and the statistical information about the modality information distribution across dataset/entity in TIVA in (b).
  • Figure 2: The overview of our NativE framework. NativE consists of two main modules called relation-guided dual adaptive fusion (ReDAF) module and collaborative modality adversarial training (CoMAT) module respectively. ReDAF is designed to fuse any input modality with modality adaptive weights and relational guidance. CoMAT aims to augment the imbalanced modality information in an adversarial manner by constructing synthetic triples to play a min-max game.
  • Figure 3: The imbalance MMKGC results. We report the MRR and Hit@10 results on the DB15K datasets. Further, we divide the test triples into three groups according to whether there was complete modal information and tally their experimental results separately, where: Group1 (both h and t are modality-complete); Group2 (one of h, r is modality-missing); Group3 (both h and t are modality-missing).
  • Figure 4: The generalization experiments of the CoMAT module on three different MMKGC models. We report the MRR and Hit@1 results on the DB15K dataset.
  • Figure 5: The results of the efficiency experiment. We report the MRR and Hit@1 results on the KVC16K/DB15K datasets.
  • ...and 1 more figures