Table of Contents
Fetching ...

Towards a theory of Façade-X data access: satisfiability of SPARQL basic graph patterns

Luigi Asprino, Enrico Daga

TL;DR

This work addresses the satisfiability problem for SPARQL basic graph patterns over Façade-X RDF graphs, a meta-model that enables direct RDF access to heterogeneous data sources. It provides a consolidated Façade-X formalism with explicit RDF mappings and develops a theory plus two annotated algorithms (top-down and bottom-up CSP) to decide BGP satisfiability in FX. Extensive experiments on synthetic benchmarks and real-world queries demonstrate practical feasibility and highlight the advantages of pre-checking satisfiability to avoid costly data loading. The results enable more efficient query planning and data integration in knowledge graphs using Façade-X, guiding future work on streaming query execution and FX-specific data-access optimizations.

Abstract

Data integration is the primary use case for knowledge graphs. However, integrated data are not typically graphs but come in different formats, for example, CSV, XML, or a relational database. Façade-X is a recently proposed method for providing direct access to an open-ended set of data formats. The method includes a meta-model that specialises RDF to fit general data structures. This model allows to express SPARQL queries targeting data sources with those structures. Previous work formalised Façade-X and demonstrated how it can theoretically represent any format expressible with a context-free grammar, as well as the relational model. A reference implementation, SPARQL Anything, demonstrates the feasibility of the approach in practice. It is noteworthy that Façade-X utilises a fraction of RDF, and, consequently, not all SPARQL queries yield a solution (i.e. are satisfiable) when evaluated over a Façade-X graph. In this article, we consolidate Façade-X, and we study the satisfiability of basic graph patterns. The theory is accompanied by an algorithm for deciding the satisfiability of basic graph patterns on Façade-X data sources. Furthermore, we provide extensive experiments with a proof-of-concept implementation, demonstrating practical feasibility, including with real-world queries. Our results pave the way for studying query execution strategies for Façade-X data access with SPARQL and supporting developers to build more efficient data integration systems for knowledge graphs.

Towards a theory of Façade-X data access: satisfiability of SPARQL basic graph patterns

TL;DR

This work addresses the satisfiability problem for SPARQL basic graph patterns over Façade-X RDF graphs, a meta-model that enables direct RDF access to heterogeneous data sources. It provides a consolidated Façade-X formalism with explicit RDF mappings and develops a theory plus two annotated algorithms (top-down and bottom-up CSP) to decide BGP satisfiability in FX. Extensive experiments on synthetic benchmarks and real-world queries demonstrate practical feasibility and highlight the advantages of pre-checking satisfiability to avoid costly data loading. The results enable more efficient query planning and data integration in knowledge graphs using Façade-X, guiding future work on streaming query execution and FX-specific data-access optimizations.

Abstract

Data integration is the primary use case for knowledge graphs. However, integrated data are not typically graphs but come in different formats, for example, CSV, XML, or a relational database. Façade-X is a recently proposed method for providing direct access to an open-ended set of data formats. The method includes a meta-model that specialises RDF to fit general data structures. This model allows to express SPARQL queries targeting data sources with those structures. Previous work formalised Façade-X and demonstrated how it can theoretically represent any format expressible with a context-free grammar, as well as the relational model. A reference implementation, SPARQL Anything, demonstrates the feasibility of the approach in practice. It is noteworthy that Façade-X utilises a fraction of RDF, and, consequently, not all SPARQL queries yield a solution (i.e. are satisfiable) when evaluated over a Façade-X graph. In this article, we consolidate Façade-X, and we study the satisfiability of basic graph patterns. The theory is accompanied by an algorithm for deciding the satisfiability of basic graph patterns on Façade-X data sources. Furthermore, we provide extensive experiments with a proof-of-concept implementation, demonstrating practical feasibility, including with real-world queries. Our results pave the way for studying query execution strategies for Façade-X data access with SPARQL and supporting developers to build more efficient data integration systems for knowledge graphs.
Paper Structure (43 sections, 11 theorems, 5 figures, 25 tables, 6 algorithms)

This paper contains 43 sections, 11 theorems, 5 figures, 25 tables, 6 algorithms.

Key Result

Proposition 1

Every non-root container is recursively contained by the root container.

Figures (5)

  • Figure 1: Example CSV file
  • Figure 2: A diagrammatic representation of the Façade-X model. Unlabelled arcs represent subsumtion (e.g. Root is a type of Container).
  • Figure 3: Dimension of BGPs from real-world queries: number of triples.
  • Figure 4: Dimension of BGPs from real-world queries: number of variables.
  • Figure 5: BGPs by number of variables.

Theorems & Definitions (18)

  • Example 1
  • Proposition 1: Connectedness of containers
  • Proposition 2
  • Example 2
  • Example 3
  • Example 4
  • Corollary 3
  • Corollary 4
  • Corollary 5
  • Corollary 6
  • ...and 8 more