Zero-Shot Relational Learning for Multimodal Knowledge Graphs
Rui Cai, Shichao Pei, Xiangliang Zhang
TL;DR
This work tackles zero-shot relational learning in multimodal knowledge graphs, where new relations must be inferred without training triples. It introduces MRE (Multimodal Relation Extrapolation), an end-to-end framework composed of a Multimodal Learner, a Structure Consolidator, and a Relational Embedding Generator to fuse image/text signals with KG topology and generate embeddings for unseen relations. The Multimodal Learner aligns visual and textual modalities via a masked autoencoder, the Structure Consolidator injects structural KG information through a GNN, and the Relational Embedding Generator employs a GAN-based objective to map relation descriptions to embeddings, enabling zero-shot inference. Across FB15K-237-ZS, DB15K-ZS, and WN18-IMG-ZS, MRE outperforms strong baselines, demonstrating that multimodal signals plus structural context substantially improve extrapolation of unseen relations. The approach advances practical KG maintenance by enabling plausible reasoning for newly discovered relations without training triples, with a noted direction for future work on leveraging multiple images per entity.
Abstract
Relational learning is an essential task in the domain of knowledge representation, particularly in knowledge graph completion (KGC). While relational learning in traditional single-modal settings has been extensively studied, exploring it within a multimodal KGC context presents distinct challenges and opportunities. One of the major challenges is inference on newly discovered relations without any associated training data. This zero-shot relational learning scenario poses unique requirements for multimodal KGC, i.e., utilizing multimodality to facilitate relational learning.However, existing works fail to support the leverage of multimodal information and leave the problem unexplored. In this paper, we propose a novel end-to-end framework, consisting of three components, i.e., multimodal learner, structure consolidator, and relation embedding generator, to integrate diverse multimodal information and knowledge graph structures to facilitate the zero-shot relational learning. Evaluation results on three multimodal knowledge graphs demonstrate the superior performance of our proposed method.
