Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

Yichi Zhang; Zhuo Chen; Lingbing Guo; Yajing Xu; Binbin Hu; Ziqi Liu; Wen Zhang; Huajun Chen

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen

TL;DR

MoMoK introduces a relation-guided mixture-of-modality knowledge-experts framework for MMKGC, combining per-modality experts (ReMoKE), a multi-modal joint decision mechanism (MuJoD), and an expert-information disentanglement module (ExID) to adaptively fuse modalities under relational context. By using relation-aware gating, Tucker-based per-modality scoring, and CLUB-based mutual-information regularization, the approach yields state-of-the-art results across four public MMKG benchmarks and demonstrates robustness to modality noise, missing data, and data sparsity. The framework provides interpretable insights through adaptive modality weights and case studies, showing that different relations rely on different modalities and expert heads. Overall, MoMoK advances multi-modal KG completion by explicitly modeling relational context with modular, specialized experts and principled disentanglement, offering practical improvements for MMKG reasoning and potential integration with larger multimodal systems.

Abstract

Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning, which can enhance reasoning tasks within the MMKGs, such as MMKG completion (MMKGC). The main challenge is to collaboratively model the structural information concealed in massive triples and the multi-modal features of the entities. Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies, yet they overlook the utilization of multi-perspective features concealed within the modalities under diverse relational contexts. To address this issue, we introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK for short) to learn adaptive multi-modal entity representations for better MMKGC. We design relation-guided modality knowledge experts to acquire relation-aware modality embeddings and integrate the predictions from multi-modalities to achieve joint decisions. Additionally, we disentangle the experts by minimizing their mutual information. Experiments on four public MMKG benchmarks demonstrate the outstanding performance of MoMoK under complex scenarios.

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

TL;DR

Abstract

Paper Structure (37 sections, 18 equations, 5 figures, 8 tables)

This paper contains 37 sections, 18 equations, 5 figures, 8 tables.

Introduction
Related Works
Multi-modal Knowledge Graph Completion (MMKGC)
Mixture-of-Experts (MoE)
Problem Definition
Methodology
Relation-guided Modality Knowledge Experts
Multi-modal Joint Decision
Expert Information Disentanglement
Training and Inference
Experiments and Evaluation
Datasets
Experimental Settings
Baseline Methods
Task and Evaluation Protocols
...and 22 more sections

Figures (5)

Figure 1: Different relational context requires different modality information for proper prediction.
Figure 2: Overview of our proposed MoMoK framework, which consists of three core components: the relation-guided modality knowledge experts (ReMoKE), multi-modal joint decision (MuJoD), and expert information disentanglement (ExID).
Figure 3: MMKGC results (MRR and Hit@10) of DB15K dataset under three different scenario: modality noisy, modality missing, and link sparse. We compare our method MoMoK with three recent MMKGC baselines AdaMF, TBKGC, and QBE.
Figure 4: Additional parameter analysis about the number of MoKEs and the weight $\lambda$ for the $\mathcal{L}_{club}$.
Figure 5: Attention weights visualization results. We select some relations and present the weights of each modality contributing to the joint representation $\widehat{e}_{Joint}$. We further present the weights $G_i$ for $K (K=3)$ ReMoKEs in the modality outputs $\widehat{\bm{e}}_m$. Abbreviations for modalities: Structure (STR), Image (IMG), Text (TXT). M.k in the legend denotes the k-th expert of modality M.

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

TL;DR

Abstract

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)