Table of Contents
Fetching ...

Toward Practical Entity Alignment Method Design: Insights from New Highly Heterogeneous Knowledge Graph Datasets

Xuhui Jiang, Chengjin Xu, Yinghan Shen, Yuanzhuo Wang, Fenglong Su, Fei Sun, Zixuan Li, Zhichao Shi, Jian Guo, Huawei Shen

TL;DR

This work addresses the gap between practical, heterogeneous knowledge graphs (HHKGs) and existing EA benchmarks by introducing two realistic datasets, ICEWS-WIKI and ICEWS-YAGO, that preserve scale differences, low structure similarity, and temporal information. It reveals that GNN-based EA methods struggle under HHKG conditions, with entity name and temporal information becoming more reliable cues than structure. The authors propose Simple-HHEA, a simple yet effective model that fuses name, time, and optionally structure information, and demonstrate its superior performance and efficiency on HHKGs. The findings underscore the need for adaptable, information-quality-aware EA designs and richer HHKG-focused datasets to drive practical improvements.

Abstract

The flourishing of knowledge graph applications has driven the need for entity alignment (EA) across KGs. However, the heterogeneity of practical KGs, characterized by differing scales, structures, and limited overlapping entities, greatly surpasses that of existing EA datasets. This discrepancy highlights an oversimplified heterogeneity in current EA datasets, which obstructs a full understanding of the advancements achieved by recent EA methods. In this paper, we study the performance of EA methods in practical settings, specifically focusing on the alignment of highly heterogeneous KGs (HHKGs). Firstly, we address the oversimplified heterogeneity settings of current datasets and propose two new HHKG datasets that closely mimic practical EA scenarios. Then, based on these datasets, we conduct extensive experiments to evaluate previous representative EA methods. Our findings reveal that, in aligning HHKGs, valuable structure information can hardly be exploited through message-passing and aggregation mechanisms. This phenomenon leads to inferior performance of existing EA methods, especially those based on GNNs. These findings shed light on the potential problems associated with the conventional application of GNN-based methods as a panacea for all EA datasets. Consequently, in light of these observations and to elucidate what EA methodology is genuinely beneficial in practical scenarios, we undertake an in-depth analysis by implementing a simple but effective approach: Simple-HHEA. This method adaptly integrates entity name, structure, and temporal information to navigate the challenges posed by HHKGs. Our experiment results conclude that the key to the future EA model design in practice lies in their adaptability and efficiency to varying information quality conditions, as well as their capability to capture patterns across HHKGs.

Toward Practical Entity Alignment Method Design: Insights from New Highly Heterogeneous Knowledge Graph Datasets

TL;DR

This work addresses the gap between practical, heterogeneous knowledge graphs (HHKGs) and existing EA benchmarks by introducing two realistic datasets, ICEWS-WIKI and ICEWS-YAGO, that preserve scale differences, low structure similarity, and temporal information. It reveals that GNN-based EA methods struggle under HHKG conditions, with entity name and temporal information becoming more reliable cues than structure. The authors propose Simple-HHEA, a simple yet effective model that fuses name, time, and optionally structure information, and demonstrate its superior performance and efficiency on HHKGs. The findings underscore the need for adaptable, information-quality-aware EA designs and richer HHKG-focused datasets to drive practical improvements.

Abstract

The flourishing of knowledge graph applications has driven the need for entity alignment (EA) across KGs. However, the heterogeneity of practical KGs, characterized by differing scales, structures, and limited overlapping entities, greatly surpasses that of existing EA datasets. This discrepancy highlights an oversimplified heterogeneity in current EA datasets, which obstructs a full understanding of the advancements achieved by recent EA methods. In this paper, we study the performance of EA methods in practical settings, specifically focusing on the alignment of highly heterogeneous KGs (HHKGs). Firstly, we address the oversimplified heterogeneity settings of current datasets and propose two new HHKG datasets that closely mimic practical EA scenarios. Then, based on these datasets, we conduct extensive experiments to evaluate previous representative EA methods. Our findings reveal that, in aligning HHKGs, valuable structure information can hardly be exploited through message-passing and aggregation mechanisms. This phenomenon leads to inferior performance of existing EA methods, especially those based on GNNs. These findings shed light on the potential problems associated with the conventional application of GNN-based methods as a panacea for all EA datasets. Consequently, in light of these observations and to elucidate what EA methodology is genuinely beneficial in practical scenarios, we undertake an in-depth analysis by implementing a simple but effective approach: Simple-HHEA. This method adaptly integrates entity name, structure, and temporal information to navigate the challenges posed by HHKGs. Our experiment results conclude that the key to the future EA model design in practice lies in their adaptability and efficiency to varying information quality conditions, as well as their capability to capture patterns across HHKGs.
Paper Structure (25 sections, 5 equations, 7 figures, 5 tables)

This paper contains 25 sections, 5 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: An example of the HHKGs (ICEWS and WIKI) and their statistics. The scale and density of the KGs are very different, and the overlapping ratios (45.79% and 31.82%) indicate that the two KGs are far from the 1-to-1 assumption.
  • Figure 2: The comparison of degree distribution comparison in DBP15K(EN-FR), DBP-WIKI, and ICEWS-WIKI/YAGO. The X-axis denotes degree and Y-axis represents the entities' ratio.
  • Figure 3: An example of feature and pre-aligned entity pair label propagation of messaging passing mechanism.
  • Figure 4: Case studies of the Dual-AMN(name). The X-axis denotes the performance of the Dual-AMN(name), measured in terms of Rank Score (higher indicates poorer performance). The Y-axis represents the average structure similarity, calculated based on the similarity in the structure of neighboring entities with correct alignment labels in the training set.
  • Figure 5: Comparison of different structure mask ratios on the ICEWS-WIKI/YAGO. The X-axis denotes the mask proportions of facts, and the Y-axis represents the Hits@1 metric.
  • ...and 2 more figures