HybEA: Hybrid Models for Entity Alignment
Nikolaos Fanourakis, Fatia Lekbour, Guillaume Renton, Vasilis Efthymiou, Vassilis Christophides
TL;DR
HybEA tackles the dual challenges of semantic and structural heterogeneity in knowledge graph entity alignment by blending a novel factual attention-based model with state-of-the-art structural embeddings in a semi-supervised, co-training loop. The framework is modular: the factual component computes attribute-focused evidence via attention (with Sentence-BERT embeddings for literals) and the structural component can be swapped between Knowformer (triplet-based) and RREA (graph-based) to suit dataset characteristics. A reciprocity filter identifies high-confidence matches to enrich training in successive cycles, while a final bipartite matching step handles remaining candidates, all without enforcing a strict one-to-one constraint. Extensive experiments across ten monolingual and multilingual datasets show HybEA achieving state-of-the-art performance, with up to 16% relative gains in Hits@1 on monolingual data and robust improvements on multilingual and non-one-to-one datasets. Ablation and efficiency analyses confirm the necessity of both components and demonstrate the practical adaptability and effectiveness of the approach.
Abstract
Entity Alignment (EA) aims to detect descriptions of the same real-world entities among different Knowledge Graphs (KG). Several embedding methods have been proposed to rank potentially matching entities of two KGs according to their similarity in the embedding space. However, existing EA embedding methods are challenged by the diverse levels of structural (i.e., neighborhood entities) and semantic (e.g., entity names and literal property values) heterogeneity exhibited by real-world KGs, especially when they are spanning several domains (DBpedia, Wikidata). Existing methods either focus on one of the two heterogeneity kinds depending on the context (mono- vs multi-lingual). To address this limitation, we propose a flexible framework called HybEA, that is a hybrid of two models, a novel attention-based factual model, co-trained with a state-of-the-art structural model. Our experimental results demonstrate that HybEA outperforms the state-of-the-art EA systems, achieving a 16% average relative improvement of Hits@1, ranging from 3.6% up to 40% in 5 monolingual datasets, with some datasets that can now be considered as solved. We also show that HybEA outperforms state-of-the-art methods in 3 multi-lingual datasets, as well as on 2 datasets that drop the unrealistic, yet widely adopted, one-to-one assumption. Overall, HybEA outperforms all (11) baseline methods in all (3) measures and in all (10) datasets evaluated, with a statistically significant difference.
