IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment
Taoyu Su, Jiawei Sheng, Shicheng Wang, Xinghua Zhang, Hongbo Xu, Tingwen Liu
TL;DR
This work tackles multi-modal entity alignment (MMEA) by addressing alignment-irrelevant information carried by modalities such as images, relations, and attributes. It introduces IBMEA, a variational-information-bottleneck framework that learns per-modality latent representations and enforces two regularizers: (i) modal-specific information bottleneck regularizers to suppress redundant cues, and (ii) a modal-hybrid information contrastive regularizer to fuse modalities while maximizing cross-graph alignment signals. The model uses a multi-modal variational encoder (including a variational graph encoder) to produce Gaussian latent variables per modality, with a tractable optimization objective derived from variational bounds on mutual information. Experiments on five benchmarks (two cross-KG and three bilingual datasets) show IBMEA achieving state-of-the-art results, particularly under low-resource and noisy conditions, and ablation analyses confirm the effectiveness of each regularizer and modality. Overall, the approach provides a principled, robust mechanism to extract alignment-relevant information from heterogeneous signals, enabling more reliable MMKG fusion and entity alignment in realistic settings.
Abstract
Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational information bottleneck for multi-modal entity alignment (IBMEA), which emphasizes the alignment-relevant information and suppresses the alignment-irrelevant information in generating entity representations. Specifically, we devise multi-modal variational encoders to generate modal-specific entity representations as probability distributions. Then, we propose four modal-specific information bottleneck regularizers, limiting the misleading clues in refining modal-specific entity representations. Finally, we propose a modal-hybrid information contrastive regularizer to integrate all the refined modal-specific representations, enhancing the entity similarity between MMKGs to achieve MMEA. We conduct extensive experiments on two cross-KG and three bilingual MMEA datasets. Experimental results demonstrate that our model consistently outperforms previous state-of-the-art methods, and also shows promising and robust performance in low-resource and high-noise data scenarios.
