SE-GNN: Seed Expanded-Aware Graph Neural Network with Iterative Optimization for Semi-supervised Entity Alignment
Tao Meng, Shuo Shan, Hongen Shao, Yuntao Shou, Wei Ai, Keqin Li
TL;DR
This work tackles semi-supervised entity alignment across knowledge graphs under limited and noisy seed signals. It introduces SE-GNN, a three-part framework consisting of seed expansion based on neighborhood-level semantic information, a local-global awareness mechanism (LGAM) for richer embeddings, and a threshold nearest neighbor embedding correction strategy with Xavier-based embedding correction to mitigate distortion. Empirical results on DBP15K, SRPRS, and DWY100K show state-of-the-art performance across traditional and semi-supervised baselines, with ablations confirming the value of seed expansion, LGAM, and embedding correction. The approach holds practical significance for scalable KG fusion, enabling robust alignment with reduced reliance on manually labeled seeds and improved resilience to noisy seeds.
Abstract
Entity alignment aims to use pre-aligned seed pairs to find other equivalent entities from different knowledge graphs (KGs) and is widely used in graph fusion-related fields. However, as the scale of KGs increases, manually annotating pre-aligned seed pairs becomes difficult. Existing research utilizes entity embeddings obtained by aggregating single structural information to identify potential seed pairs, thus reducing the reliance on pre-aligned seed pairs. However, due to the structural heterogeneity of KGs, the quality of potential seed pairs obtained using only a single structural information is not ideal. In addition, although existing research improves the quality of potential seed pairs through semi-supervised iteration, they underestimate the impact of embedding distortion produced by noisy seed pairs on the alignment effect. In order to solve the above problems, we propose a seed expanded-aware graph neural network with iterative optimization for semi-supervised entity alignment, named SE-GNN. First, we utilize the semantic attributes and structural features of entities, combined with a conditional filtering mechanism, to obtain high-quality initial potential seed pairs. Next, we designed a local and global awareness mechanism. It introduces initial potential seed pairs and combines local and global information to obtain a more comprehensive entity embedding representation, which alleviates the impact of KGs structural heterogeneity and lays the foundation for the optimization of initial potential seed pairs. Then, we designed the threshold nearest neighbor embedding correction strategy. It combines the similarity threshold and the bidirectional nearest neighbor method as a filtering mechanism to select iterative potential seed pairs and also uses an embedding correction strategy to eliminate the embedding distortion.
