Table of Contents
Fetching ...

Condensed Representation of RDF and its Application on Graph Versioning

Jey Puget Gil, Emmanuel Coquery, John Samuel, Gilles Gesquiere

TL;DR

This work formalizes a condensed representation for evolving RDF graphs and develops QuaQue, a system within the ConVer-G framework, to query across multiple graph versions. It introduces flat and condensed dataset models, an extended SPARQL algebra, and metadata operators to support version-aware querying, along with proofs of equivalence between representations. Benchmark results show that condensed representations save space and can offer favorable performance for certain aggregative workloads, while flat representations excel for non-aggregative queries. The approach aims to enable multi-view, provenance-aware analysis of evolving knowledge graphs and integrates with existing RDF infrastructure through a translation layer. Overall, it provides a principled foundation for efficient, version-aware RDF data management with practical implications for domains like urban data, biomedical informatics, and software evolution.

Abstract

Evolving phenomena, often complex, can be represented using knowledge graphs, which have the capability to model heterogeneous data from multiple sources. Nowadays, a considerable amount of sources delivering periodic updates to knowledge graphs in various domains is openly available. The evolution of data is of interest to knowledge graph management systems, and therefore it is crucial to organize these constantly evolving data to make them easily accessible and exploitable for analysis. In this article, we will present and formalize the condensed representation of these evolving graphs and propose a new solution called QuaQue that allows querying across multiple versions of graphs and we also present the results of our benchmark comparing our solution against existing approaches.

Condensed Representation of RDF and its Application on Graph Versioning

TL;DR

This work formalizes a condensed representation for evolving RDF graphs and develops QuaQue, a system within the ConVer-G framework, to query across multiple graph versions. It introduces flat and condensed dataset models, an extended SPARQL algebra, and metadata operators to support version-aware querying, along with proofs of equivalence between representations. Benchmark results show that condensed representations save space and can offer favorable performance for certain aggregative workloads, while flat representations excel for non-aggregative queries. The approach aims to enable multi-view, provenance-aware analysis of evolving knowledge graphs and integrates with existing RDF infrastructure through a translation layer. Overall, it provides a principled foundation for efficient, version-aware RDF data management with practical implications for domains like urban data, biomedical informatics, and software evolution.

Abstract

Evolving phenomena, often complex, can be represented using knowledge graphs, which have the capability to model heterogeneous data from multiple sources. Nowadays, a considerable amount of sources delivering periodic updates to knowledge graphs in various domains is openly available. The evolution of data is of interest to knowledge graph management systems, and therefore it is crucial to organize these constantly evolving data to make them easily accessible and exploitable for analysis. In this article, we will present and formalize the condensed representation of these evolving graphs and propose a new solution called QuaQue that allows querying across multiple versions of graphs and we also present the results of our benchmark comparing our solution against existing approaches.

Paper Structure

This paper contains 109 sections, 6 theorems, 57 equations, 8 figures, 12 tables.

Key Result

theorem 1

Given a flat model $d _{ F }$, its image by the composition of the $C$ and the $F$ functions is equal to the flat model.

Figures (8)

  • Figure 1: Example of a RDF triple
  • Figure 2: Example of a named graph
  • Figure 3: Dataset, algebra and results representation
  • Figure 4: Hierarchy of variable representations in SPARQL algebra
  • Figure 5: Versioning ontology
  • ...and 3 more figures

Theorems & Definitions (33)

  • definition 1: Condensation
  • definition 2: Flattening
  • theorem 1: $C \circ F ( d _{ F } ) = d _{ F }$
  • theorem 2: $F \circ C ( d _{ C } ) = d _{ C }$
  • definition 3: Variable representation
  • definition 4: Environment
  • definition 5: Domain of a set of environments
  • definition 6: Lower environment of a variable
  • definition 7: Logical provability of an environment
  • lemma 1: Greatest Environment Typing
  • ...and 23 more