Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning
Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen
TL;DR
MoMoK introduces a relation-guided mixture-of-modality knowledge-experts framework for MMKGC, combining per-modality experts (ReMoKE), a multi-modal joint decision mechanism (MuJoD), and an expert-information disentanglement module (ExID) to adaptively fuse modalities under relational context. By using relation-aware gating, Tucker-based per-modality scoring, and CLUB-based mutual-information regularization, the approach yields state-of-the-art results across four public MMKG benchmarks and demonstrates robustness to modality noise, missing data, and data sparsity. The framework provides interpretable insights through adaptive modality weights and case studies, showing that different relations rely on different modalities and expert heads. Overall, MoMoK advances multi-modal KG completion by explicitly modeling relational context with modular, specialized experts and principled disentanglement, offering practical improvements for MMKG reasoning and potential integration with larger multimodal systems.
Abstract
Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning, which can enhance reasoning tasks within the MMKGs, such as MMKG completion (MMKGC). The main challenge is to collaboratively model the structural information concealed in massive triples and the multi-modal features of the entities. Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies, yet they overlook the utilization of multi-perspective features concealed within the modalities under diverse relational contexts. To address this issue, we introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK for short) to learn adaptive multi-modal entity representations for better MMKGC. We design relation-guided modality knowledge experts to acquire relation-aware modality embeddings and integrate the predictions from multi-modalities to achieve joint decisions. Additionally, we disentangle the experts by minimizing their mutual information. Experiments on four public MMKG benchmarks demonstrate the outstanding performance of MoMoK under complex scenarios.
