Table of Contents
Fetching ...

Optimizing Traversal Queries of Sensor Data Using a Rule-Based Reachability Approach

Bryan-Elliott Tam, Ruben Taelman, Julián Rojas Meléndez, Pieter Colpaert

TL;DR

The paper addresses efficient traversal of sensor data in decentralized linked-data environments by leveraging Link Traversal Query Processing (LTQP) over TREE-fragmented datasets. It introduces a rule-based reachability criterion that treats TREE constraints as boolean expressions $E$ and integrates the query filter $F$ into the source selection process, using a boolean solver to decide which links to follow based on the satisfiability of $F(x) \land E_i$ and a reachability predicate $c(i)$. Implemented in the Comunica engine, the approach demonstrates substantial reductions in query execution time and HTTP requests compared to a predicate-only strategy, while preserving completeness in preliminary tests. The results highlight that the internal triple-store size can significantly impact performance and point to future work on broader fragmentation strategies and more expressive online reasoning for source selection during traversal. This work suggests a practical path toward faster, more scalable traversal-based querying over unindexed, decentralized RDF data.

Abstract

Link Traversal queries face challenges in completeness and long execution time due to the size of the web. Reachability criteria define completeness by restricting the links followed by engines. However, the number of links to dereference remains the bottleneck of the approach. Web environments often have structures exploitable by query engines to prune irrelevant sources. Current criteria rely on using information from the query definition and predefined predicate. However, it is difficult to use them to traverse environments where logical expressions indicate the location of resources. We propose to use a rule-based reachability criterion that captures logical statements expressed in hypermedia descriptions within linked data documents to prune irrelevant sources. In this poster paper, we show how the Comunica link traversal engine is modified to take hints from a hypermedia control vocabulary, to prune irrelevant sources. Our preliminary findings show that by using this strategy, the query engine can significantly reduce the number of HTTP requests and the query execution time without sacrificing the completeness of results. Our work shows that the investigation of hypermedia controls in link pruning of traversal queries is a worthy effort for optimizing web queries of unindexed decentralized databases.

Optimizing Traversal Queries of Sensor Data Using a Rule-Based Reachability Approach

TL;DR

The paper addresses efficient traversal of sensor data in decentralized linked-data environments by leveraging Link Traversal Query Processing (LTQP) over TREE-fragmented datasets. It introduces a rule-based reachability criterion that treats TREE constraints as boolean expressions and integrates the query filter into the source selection process, using a boolean solver to decide which links to follow based on the satisfiability of and a reachability predicate . Implemented in the Comunica engine, the approach demonstrates substantial reductions in query execution time and HTTP requests compared to a predicate-only strategy, while preserving completeness in preliminary tests. The results highlight that the internal triple-store size can significantly impact performance and point to future work on broader fragmentation strategies and more expressive online reasoning for source selection during traversal. This work suggests a practical path toward faster, more scalable traversal-based querying over unindexed, decentralized RDF data.

Abstract

Link Traversal queries face challenges in completeness and long execution time due to the size of the web. Reachability criteria define completeness by restricting the links followed by engines. However, the number of links to dereference remains the bottleneck of the approach. Web environments often have structures exploitable by query engines to prune irrelevant sources. Current criteria rely on using information from the query definition and predefined predicate. However, it is difficult to use them to traverse environments where logical expressions indicate the location of resources. We propose to use a rule-based reachability criterion that captures logical statements expressed in hypermedia descriptions within linked data documents to prune irrelevant sources. In this poster paper, we show how the Comunica link traversal engine is modified to take hints from a hypermedia control vocabulary, to prune irrelevant sources. Our preliminary findings show that by using this strategy, the query engine can significantly reduce the number of HTTP requests and the query execution time without sacrificing the completeness of results. Our work shows that the investigation of hypermedia controls in link pruning of traversal queries is a worthy effort for optimizing web queries of unindexed decentralized databases.
Paper Structure (4 sections, 2 equations, 2 figures, 1 table)

This paper contains 4 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: On the left, is a SPARQL query to get sensor measurements and information about the sensor. On the right, is the hypermedia description of the location and constraint of the next fragment located in ex:nextNode. The constraint describes publication times ($?t$) where $?t>= \text{2022-01-03T09:47:59.000000}$.
  • Figure 2: A schematization of our rule-based reachability criteria with a TREE document. First a TREE node is dereferenced, then the TREE relations are transformed into boolean expressions $E$, followed by the construction of $F$ from the filter expression related to the path of $E$ (the variable $t$ related to saref:hasTimestamp), then the satisfiability $E \land F$ is determined and finally links to non-query relevant data are pruned.