Table of Contents
Fetching ...

Opportunities for Shape-based Optimization of Link Traversal Queries

Bryan-Elliott Tam, Ruben Taelman, Pieter Colpaert, Ruben Verborgh

TL;DR

An early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and results show that with little maintenance and work from the server, this method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries.

Abstract

Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes.

Opportunities for Shape-based Optimization of Link Traversal Queries

TL;DR

An early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and results show that with little maintenance and work from the server, this method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries.

Abstract

Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes.
Paper Structure (7 sections, 2 figures)

This paper contains 7 sections, 2 figures.

Figures (2)

  • Figure 1: First, the shape index is dereferenced, then the query-shape containment operations are performed in the query engine and lastly, only the relevant resources are dereferenced.
  • Figure 2: The execution time with shape indexes is consistently lower (up to 80% with D1V3 and S1V3) or equal to that with the type indexes (except for D3V3 and D3V4), and always uses fewer HTTP requests. The queries are denoted with first the initial of the query template (e.g., S1 for interactive-short-1), and the version of the concrete query (e.g., V0). Values not present in the plot (D7V0 and D7V3) indicate that the query timeout before the end of the execution.