Table of Contents
Fetching ...

Leveraging Intra-modal and Inter-modal Interaction for Multi-Modal Entity Alignment

Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Ru Li, Jeff Z. Pan

TL;DR

This work tackles multi-modal entity alignment across heterogeneous MMKGs by introducing MIMEA, a four-module framework that explicitly models intra-modal and inter-modal interactions. It combines a multi-modal knowledge embedding stage (via GAT and MLPs) with a probabilistic fusion (Beta distributions), optimal transport-based alignment, and modal-adaptive contrastive learning to produce robust, joint-modal representations. Extensive experiments on FB15K-DB15K and FB15K-YAGO15K demonstrate state-of-the-art performance and robustness to seed availability, with ablations confirming the critical role of structural information and inter-modal dynamics. Overall, MIMEA advances MMKG integration by systematically exploiting multi-granular interactions, achieving strong alignment accuracy while maintaining favorable computational efficiency; future work includes addressing incomplete structural knowledge through KG completion.

Abstract

Multi-modal entity alignment (MMEA) aims to identify equivalent entity pairs across different multi-modal knowledge graphs (MMKGs). Existing approaches focus on how to better encode and aggregate information from different modalities. However, it is not trivial to leverage multi-modal knowledge in entity alignment due to the modal heterogeneity. In this paper, we propose a Multi-Grained Interaction framework for Multi-Modal Entity Alignment (MIMEA), which effectively realizes multi-granular interaction within the same modality or between different modalities. MIMEA is composed of four modules: i) a Multi-modal Knowledge Embedding module, which extracts modality-specific representations with multiple individual encoders; ii) a Probability-guided Modal Fusion module, which employs a probability guided approach to integrate uni-modal representations into joint-modal embeddings, while considering the interaction between uni-modal representations; iii) an Optimal Transport Modal Alignment module, which introduces an optimal transport mechanism to encourage the interaction between uni-modal and joint-modal embeddings; iv) a Modal-adaptive Contrastive Learning module, which distinguishes the embeddings of equivalent entities from those of non-equivalent ones, for each modality. Extensive experiments conducted on two real-world datasets demonstrate the strong performance of MIMEA compared to the SoTA. Datasets and code have been submitted as supplementary materials.

Leveraging Intra-modal and Inter-modal Interaction for Multi-Modal Entity Alignment

TL;DR

This work tackles multi-modal entity alignment across heterogeneous MMKGs by introducing MIMEA, a four-module framework that explicitly models intra-modal and inter-modal interactions. It combines a multi-modal knowledge embedding stage (via GAT and MLPs) with a probabilistic fusion (Beta distributions), optimal transport-based alignment, and modal-adaptive contrastive learning to produce robust, joint-modal representations. Extensive experiments on FB15K-DB15K and FB15K-YAGO15K demonstrate state-of-the-art performance and robustness to seed availability, with ablations confirming the critical role of structural information and inter-modal dynamics. Overall, MIMEA advances MMKG integration by systematically exploiting multi-granular interactions, achieving strong alignment accuracy while maintaining favorable computational efficiency; future work includes addressing incomplete structural knowledge through KG completion.

Abstract

Multi-modal entity alignment (MMEA) aims to identify equivalent entity pairs across different multi-modal knowledge graphs (MMKGs). Existing approaches focus on how to better encode and aggregate information from different modalities. However, it is not trivial to leverage multi-modal knowledge in entity alignment due to the modal heterogeneity. In this paper, we propose a Multi-Grained Interaction framework for Multi-Modal Entity Alignment (MIMEA), which effectively realizes multi-granular interaction within the same modality or between different modalities. MIMEA is composed of four modules: i) a Multi-modal Knowledge Embedding module, which extracts modality-specific representations with multiple individual encoders; ii) a Probability-guided Modal Fusion module, which employs a probability guided approach to integrate uni-modal representations into joint-modal embeddings, while considering the interaction between uni-modal representations; iii) an Optimal Transport Modal Alignment module, which introduces an optimal transport mechanism to encourage the interaction between uni-modal and joint-modal embeddings; iv) a Modal-adaptive Contrastive Learning module, which distinguishes the embeddings of equivalent entities from those of non-equivalent ones, for each modality. Extensive experiments conducted on two real-world datasets demonstrate the strong performance of MIMEA compared to the SoTA. Datasets and code have been submitted as supplementary materials.
Paper Structure (14 sections, 5 equations, 2 figures, 6 tables)

This paper contains 14 sections, 5 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: The MMEA task between MMKG1 and MMKG2, aligning the entities Lionel Messi and Leo Messi.
  • Figure 2: MIMEA's architecture, containing the modules: Probability-guided Modal Fusion, Optimal Transport Modal Alignment, and Modal-adaptive Contrastive Learning.