IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

Taoyu Su; Jiawei Sheng; Shicheng Wang; Xinghua Zhang; Hongbo Xu; Tingwen Liu

IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

Taoyu Su, Jiawei Sheng, Shicheng Wang, Xinghua Zhang, Hongbo Xu, Tingwen Liu

TL;DR

This work tackles multi-modal entity alignment (MMEA) by addressing alignment-irrelevant information carried by modalities such as images, relations, and attributes. It introduces IBMEA, a variational-information-bottleneck framework that learns per-modality latent representations and enforces two regularizers: (i) modal-specific information bottleneck regularizers to suppress redundant cues, and (ii) a modal-hybrid information contrastive regularizer to fuse modalities while maximizing cross-graph alignment signals. The model uses a multi-modal variational encoder (including a variational graph encoder) to produce Gaussian latent variables per modality, with a tractable optimization objective derived from variational bounds on mutual information. Experiments on five benchmarks (two cross-KG and three bilingual datasets) show IBMEA achieving state-of-the-art results, particularly under low-resource and noisy conditions, and ablation analyses confirm the effectiveness of each regularizer and modality. Overall, the approach provides a principled, robust mechanism to extract alignment-relevant information from heterogeneous signals, enabling more reliable MMKG fusion and entity alignment in realistic settings.

Abstract

Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational information bottleneck for multi-modal entity alignment (IBMEA), which emphasizes the alignment-relevant information and suppresses the alignment-irrelevant information in generating entity representations. Specifically, we devise multi-modal variational encoders to generate modal-specific entity representations as probability distributions. Then, we propose four modal-specific information bottleneck regularizers, limiting the misleading clues in refining modal-specific entity representations. Finally, we propose a modal-hybrid information contrastive regularizer to integrate all the refined modal-specific representations, enhancing the entity similarity between MMKGs to achieve MMEA. We conduct extensive experiments on two cross-KG and three bilingual MMEA datasets. Experimental results demonstrate that our model consistently outperforms previous state-of-the-art methods, and also shows promising and robust performance in low-resource and high-noise data scenarios.

IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

TL;DR

Abstract

Paper Structure (32 sections, 15 equations, 7 figures, 3 tables)

This paper contains 32 sections, 15 equations, 7 figures, 3 tables.

Introduction
Preliminaries
Task Formulation
Information Bottleneck
Methodology
Multi-modal Variational Encoder
Variational Graph Encoder
Variational Visual, Attribute and Relation Encoder
Multi-modal Representation Implementation
Multi-modal Information Regularizer
Modal-specific Information Bottleneck Regularizer
Modal-hybrid Information Contrastive Regularizer
Tractable Optimization Objective
Tractable Information Bottleneck Objective
Tractable Information Contrastive Objective
...and 17 more sections

Figures (7)

Figure 1: An example of the MMEA task between MMKGs, where ImgSim denotes the similarity of the images. Given Entity_1 in MMKG-1, the model aims to predict Entity_2 from candidate entities in MMKG-2 as the true entity.
Figure 2: The framework of the proposed IBMEA for the multi-modal entity alignment task.
Figure 3: Results of removing different modalities on FB15K-DB15K dataset. w$/$o means removing the modality.
Figure 4: Results in the low-resource data scenario with proportions of seed alignments on FB15K-DB15K dataset.
Figure 5: Results on the samples with low-similarity image in $\mathbf{FB15K\hbox{-}DB15K}$ dataset.
...and 2 more figures

IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

TL;DR

Abstract

IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (7)