Table of Contents
Fetching ...

Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion

Yu Zhao, Ying Zhang, Baohang Zhou, Xinying Qian, Kehui Song, Xiangrui Cai

TL;DR

This work addresses inductive multimodal knowledge graph completion (IMKGC) by introducing the Contrast then Memorize Retrieval (CMR) framework. CMR combines unified cross-modal contrastive learning to align textual and visual signals, a memorization mechanism to store and retrieve semantic neighbors, and a retrieval augmented inference step that interpolates semantic-neighbor signals with direct query similarity. The approach demonstrates strong gains over both inductive and transductive baselines across three IMKGC datasets and provides thorough ablations, case studies, and visualization analyses to validate the benefits of multimodal cues and semantic neighbor retrieval. The results indicate that explicit handling of semantic neighbors and cross-modal alignment significantly improves generalization to unseen entities, with practical implications for real-world, evolving knowledge graphs.

Abstract

A large number of studies have emerged for Multimodal Knowledge Graph Completion (MKGC) to predict the missing links in MKGs. However, fewer studies have been proposed to study the inductive MKGC (IMKGC) involving emerging entities unseen during training. Existing inductive approaches focus on learning textual entity representations, which neglect rich semantic information in visual modality. Moreover, they focus on aggregating structural neighbors from existing KGs, which of emerging entities are usually limited. However, the semantic neighbors are decoupled from the topology linkage and usually imply the true target entity. In this paper, we propose the IMKGC task and a semantic neighbor retrieval-enhanced IMKGC framework CMR, where the contrast brings the helpful semantic neighbors close, and then the memorize supports semantic neighbor retrieval to enhance inference. Specifically, we first propose a unified cross-modal contrastive learning to simultaneously capture the textual-visual and textual-textual correlations of query-entity pairs in a unified representation space. The contrastive learning increases the similarity of positive query-entity pairs, therefore making the representations of helpful semantic neighbors close. Then, we explicitly memorize the knowledge representations to support the semantic neighbor retrieval. At test time, we retrieve the nearest semantic neighbors and interpolate them to the query-entity similarity distribution to augment the final prediction. Extensive experiments validate the effectiveness of CMR on three inductive MKGC datasets. Codes are available at https://github.com/OreOZhao/CMR.

Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion

TL;DR

This work addresses inductive multimodal knowledge graph completion (IMKGC) by introducing the Contrast then Memorize Retrieval (CMR) framework. CMR combines unified cross-modal contrastive learning to align textual and visual signals, a memorization mechanism to store and retrieve semantic neighbors, and a retrieval augmented inference step that interpolates semantic-neighbor signals with direct query similarity. The approach demonstrates strong gains over both inductive and transductive baselines across three IMKGC datasets and provides thorough ablations, case studies, and visualization analyses to validate the benefits of multimodal cues and semantic neighbor retrieval. The results indicate that explicit handling of semantic neighbors and cross-modal alignment significantly improves generalization to unseen entities, with practical implications for real-world, evolving knowledge graphs.

Abstract

A large number of studies have emerged for Multimodal Knowledge Graph Completion (MKGC) to predict the missing links in MKGs. However, fewer studies have been proposed to study the inductive MKGC (IMKGC) involving emerging entities unseen during training. Existing inductive approaches focus on learning textual entity representations, which neglect rich semantic information in visual modality. Moreover, they focus on aggregating structural neighbors from existing KGs, which of emerging entities are usually limited. However, the semantic neighbors are decoupled from the topology linkage and usually imply the true target entity. In this paper, we propose the IMKGC task and a semantic neighbor retrieval-enhanced IMKGC framework CMR, where the contrast brings the helpful semantic neighbors close, and then the memorize supports semantic neighbor retrieval to enhance inference. Specifically, we first propose a unified cross-modal contrastive learning to simultaneously capture the textual-visual and textual-textual correlations of query-entity pairs in a unified representation space. The contrastive learning increases the similarity of positive query-entity pairs, therefore making the representations of helpful semantic neighbors close. Then, we explicitly memorize the knowledge representations to support the semantic neighbor retrieval. At test time, we retrieve the nearest semantic neighbors and interpolate them to the query-entity similarity distribution to augment the final prediction. Extensive experiments validate the effectiveness of CMR on three inductive MKGC datasets. Codes are available at https://github.com/OreOZhao/CMR.
Paper Structure (37 sections, 14 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 37 sections, 14 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: We propose to aggregate semantic neighbors to augment inductive MKGC (Figure \ref{['fig:intro_knn']}). Moreover, we propose to capture both textual-textual and textual-visual correlation in the query-entity pairs (Figure \ref{['fig:intro_clip']}).
  • Figure 2: Contrastive learning, Memorization, and Retrieval (CMR) framework for Inductive Multimodal Knowledge Graph Completion. Contrastive learning optimizes the bi-encoders to shorten the distance between query-entity pairs and to capture multimodal semantic correlation in a unified representation space. Memorization explicitly memorizes the knowledge representations after training. The retrieval from the knowledge store aggregates the semantic neighbors and interpolates them to the query-entity similarity distribution for the final prediction at test time.
  • Figure 3: Token-wise similarity of query and ground-truth entity e1 versus query and irrelevant entity e2.
  • Figure 4: Performance with different semantic neighbor count $k$ and interpolation parameter $\lambda$ on FB15K-237 ind.
  • Figure 5: The t-SNE van2008tsne visualization of unseen entity embeddings with different entity types.