Table of Contents
Fetching ...

SE-GNN: Seed Expanded-Aware Graph Neural Network with Iterative Optimization for Semi-supervised Entity Alignment

Tao Meng, Shuo Shan, Hongen Shao, Yuntao Shou, Wei Ai, Keqin Li

TL;DR

This work tackles semi-supervised entity alignment across knowledge graphs under limited and noisy seed signals. It introduces SE-GNN, a three-part framework consisting of seed expansion based on neighborhood-level semantic information, a local-global awareness mechanism (LGAM) for richer embeddings, and a threshold nearest neighbor embedding correction strategy with Xavier-based embedding correction to mitigate distortion. Empirical results on DBP15K, SRPRS, and DWY100K show state-of-the-art performance across traditional and semi-supervised baselines, with ablations confirming the value of seed expansion, LGAM, and embedding correction. The approach holds practical significance for scalable KG fusion, enabling robust alignment with reduced reliance on manually labeled seeds and improved resilience to noisy seeds.

Abstract

Entity alignment aims to use pre-aligned seed pairs to find other equivalent entities from different knowledge graphs (KGs) and is widely used in graph fusion-related fields. However, as the scale of KGs increases, manually annotating pre-aligned seed pairs becomes difficult. Existing research utilizes entity embeddings obtained by aggregating single structural information to identify potential seed pairs, thus reducing the reliance on pre-aligned seed pairs. However, due to the structural heterogeneity of KGs, the quality of potential seed pairs obtained using only a single structural information is not ideal. In addition, although existing research improves the quality of potential seed pairs through semi-supervised iteration, they underestimate the impact of embedding distortion produced by noisy seed pairs on the alignment effect. In order to solve the above problems, we propose a seed expanded-aware graph neural network with iterative optimization for semi-supervised entity alignment, named SE-GNN. First, we utilize the semantic attributes and structural features of entities, combined with a conditional filtering mechanism, to obtain high-quality initial potential seed pairs. Next, we designed a local and global awareness mechanism. It introduces initial potential seed pairs and combines local and global information to obtain a more comprehensive entity embedding representation, which alleviates the impact of KGs structural heterogeneity and lays the foundation for the optimization of initial potential seed pairs. Then, we designed the threshold nearest neighbor embedding correction strategy. It combines the similarity threshold and the bidirectional nearest neighbor method as a filtering mechanism to select iterative potential seed pairs and also uses an embedding correction strategy to eliminate the embedding distortion.

SE-GNN: Seed Expanded-Aware Graph Neural Network with Iterative Optimization for Semi-supervised Entity Alignment

TL;DR

This work tackles semi-supervised entity alignment across knowledge graphs under limited and noisy seed signals. It introduces SE-GNN, a three-part framework consisting of seed expansion based on neighborhood-level semantic information, a local-global awareness mechanism (LGAM) for richer embeddings, and a threshold nearest neighbor embedding correction strategy with Xavier-based embedding correction to mitigate distortion. Empirical results on DBP15K, SRPRS, and DWY100K show state-of-the-art performance across traditional and semi-supervised baselines, with ablations confirming the value of seed expansion, LGAM, and embedding correction. The approach holds practical significance for scalable KG fusion, enabling robust alignment with reduced reliance on manually labeled seeds and improved resilience to noisy seeds.

Abstract

Entity alignment aims to use pre-aligned seed pairs to find other equivalent entities from different knowledge graphs (KGs) and is widely used in graph fusion-related fields. However, as the scale of KGs increases, manually annotating pre-aligned seed pairs becomes difficult. Existing research utilizes entity embeddings obtained by aggregating single structural information to identify potential seed pairs, thus reducing the reliance on pre-aligned seed pairs. However, due to the structural heterogeneity of KGs, the quality of potential seed pairs obtained using only a single structural information is not ideal. In addition, although existing research improves the quality of potential seed pairs through semi-supervised iteration, they underestimate the impact of embedding distortion produced by noisy seed pairs on the alignment effect. In order to solve the above problems, we propose a seed expanded-aware graph neural network with iterative optimization for semi-supervised entity alignment, named SE-GNN. First, we utilize the semantic attributes and structural features of entities, combined with a conditional filtering mechanism, to obtain high-quality initial potential seed pairs. Next, we designed a local and global awareness mechanism. It introduces initial potential seed pairs and combines local and global information to obtain a more comprehensive entity embedding representation, which alleviates the impact of KGs structural heterogeneity and lays the foundation for the optimization of initial potential seed pairs. Then, we designed the threshold nearest neighbor embedding correction strategy. It combines the similarity threshold and the bidirectional nearest neighbor method as a filtering mechanism to select iterative potential seed pairs and also uses an embedding correction strategy to eliminate the embedding distortion.

Paper Structure

This paper contains 36 sections, 29 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: A cross-language knowledge graph composed of different data sources. Among entities with colored backgrounds, entities of the same color are equivalent. The entities in the gray background are non-equivalent entities. Both KGs have incomplete information about Napoleon.
  • Figure 2: Framework diagram of SE-GNN. It consists of three parts: seed expansion, iterative optimization, and entity alignment. First, the seed expansion part obtains the initial potential seed through neighborhood-level semantic information and inputs it and the pre-aligned seed pair into the iterative optimization part. Next, the iterative optimization part optimizes seed pairs and corrects entity embeddings through local and global awareness mechanisms and threshold nearest neighbor embedding correction strategy. Finally, we will input the optimized potential seed pairs after the iteration round into the local and global awareness mechanism again to obtain the final entity embedding and perform entity alignment.
  • Figure 3: The details of the local and global awareness mechanism process include the local relation awareness module and global entity awareness module.
  • Figure 4: The effect of different number of neural network layers (a) and number of high-order neighbors (b) on SE-GNN (tradi).
  • Figure 5: The effect of different seed ratios (a) and optimization round intervals (b) on SE-GNN (semi).
  • ...and 2 more figures