An Algebraic Foundation for Knowledge Graph Construction (Extended Version)
Sitt Min Oo, Olaf Hartig
TL;DR
Knowledge graphs are populated by declarative mappings from heterogeneous data sources, but existing languages lack formal foundations, risking inconsistent semantics and hindering optimization. The paper proposes a language-agnostic algebra for mapping definitions, built on a relational-style data model with mapping relations $r=(A,I)$ and five operator types that can be composed into algebra expressions, and it demonstrates translations of RML into the algebra to give a formal semantics for RML. It further provides algebraic equivalences as rewrite rules to optimize mapping plans, enabling provably correct plan transformations. The algebra offers a unified, language-agnostic basis for implementing and optimizing KG construction engines and supports multi-language interoperability across input sources. Overall, the approach yields provable correctness in optimizations and a formal framework for comparing, translating, and extending KG mapping languages.
Abstract
Although they exist since more than ten years already, have attracted diverse implementations, and have been used successfully in a significant number of applications, declarative mapping languages for constructing knowledge graphs from heterogeneous types of data sources still lack a solid formal foundation. This makes it impossible to introduce implementation and optimization techniques that are provably correct and, in fact, has led to discrepancies between different implementations. Moreover, it precludes studying fundamental properties of different languages (e.g., expressive power). To address this gap, this paper introduces a language-agnostic algebra for capturing mapping definitions. As further contributions, we show that the popular mapping language RML can be translated into our algebra (by which we also provide a formal definition of the semantics of RML) and we prove several algebraic rewriting rules that can be used to optimize mapping plans based on our algebra.
