Table of Contents
Fetching ...

Universal Knowledge Graph Embeddings

N'Dah Jean Kouagou, Caglar Demir, Hamada M. Zahera, Adrian Wilke, Stefan Heindorf, Jiayi Li, Axel-Cyrille Ngonga Ngomo

TL;DR

This work tackles the problem of non-aligned embeddings across large knowledge graphs by introducing universal knowledge graph embeddings learned on a fused graph built from multiple sources via owl:sameAs. It presents a graph fusion pipeline that assigns a single identity to shared entities and then applies KGEs (notably ConEx) to learn cross-graph representations, demonstrated on a merged DBpedia+Wikidata graph with hundreds of millions of entities. Experimental results show that universal embeddings outperform single-graph embeddings in link prediction, with ConEx providing the strongest gains, and emphasize the benefit of multi-source information integration. An open, FAIR-compliant API and open-source release enable broad access and potential integration into multi-source entity resolution and graph foundation models, highlighting practical impact for cross-graph reasoning and search.

Abstract

A variety of knowledge graph embedding approaches have been developed. Most of them obtain embeddings by learning the structure of the knowledge graph within a link prediction setting. As a result, the embeddings reflect only the structure of a single knowledge graph, and embeddings for different knowledge graphs are not aligned, e.g., they cannot be used to find similar entities across knowledge graphs via nearest neighbor search. However, knowledge graph embedding applications such as entity disambiguation require a more global representation, i.e., a representation that is valid across multiple sources. We propose to learn universal knowledge graph embeddings from large-scale interlinked knowledge sources. To this end, we fuse large knowledge graphs based on the owl:sameAs relation such that every entity is represented by a unique identity. We instantiate our idea by computing universal embeddings based on DBpedia and Wikidata yielding embeddings for about 180 million entities, 15 thousand relations, and 1.2 billion triples. We believe our computed embeddings will support the emerging field of graph foundation models. Moreover, we develop a convenient API to provide embeddings as a service. Experiments on link prediction suggest that universal knowledge graph embeddings encode better semantics compared to embeddings computed on a single knowledge graph. For reproducibility purposes, we provide our source code and datasets open access.

Universal Knowledge Graph Embeddings

TL;DR

This work tackles the problem of non-aligned embeddings across large knowledge graphs by introducing universal knowledge graph embeddings learned on a fused graph built from multiple sources via owl:sameAs. It presents a graph fusion pipeline that assigns a single identity to shared entities and then applies KGEs (notably ConEx) to learn cross-graph representations, demonstrated on a merged DBpedia+Wikidata graph with hundreds of millions of entities. Experimental results show that universal embeddings outperform single-graph embeddings in link prediction, with ConEx providing the strongest gains, and emphasize the benefit of multi-source information integration. An open, FAIR-compliant API and open-source release enable broad access and potential integration into multi-source entity resolution and graph foundation models, highlighting practical impact for cross-graph reasoning and search.

Abstract

A variety of knowledge graph embedding approaches have been developed. Most of them obtain embeddings by learning the structure of the knowledge graph within a link prediction setting. As a result, the embeddings reflect only the structure of a single knowledge graph, and embeddings for different knowledge graphs are not aligned, e.g., they cannot be used to find similar entities across knowledge graphs via nearest neighbor search. However, knowledge graph embedding applications such as entity disambiguation require a more global representation, i.e., a representation that is valid across multiple sources. We propose to learn universal knowledge graph embeddings from large-scale interlinked knowledge sources. To this end, we fuse large knowledge graphs based on the owl:sameAs relation such that every entity is represented by a unique identity. We instantiate our idea by computing universal embeddings based on DBpedia and Wikidata yielding embeddings for about 180 million entities, 15 thousand relations, and 1.2 billion triples. We believe our computed embeddings will support the emerging field of graph foundation models. Moreover, we develop a convenient API to provide embeddings as a service. Experiments on link prediction suggest that universal knowledge graph embeddings encode better semantics compared to embeddings computed on a single knowledge graph. For reproducibility purposes, we provide our source code and datasets open access.
Paper Structure (17 sections, 1 equation, 3 tables, 1 algorithm)