Optimizing Traversal Queries of Sensor Data Using a Rule-Based Reachability Approach
Bryan-Elliott Tam, Ruben Taelman, Julián Rojas Meléndez, Pieter Colpaert
TL;DR
The paper addresses efficient traversal of sensor data in decentralized linked-data environments by leveraging Link Traversal Query Processing (LTQP) over TREE-fragmented datasets. It introduces a rule-based reachability criterion that treats TREE constraints as boolean expressions $E$ and integrates the query filter $F$ into the source selection process, using a boolean solver to decide which links to follow based on the satisfiability of $F(x) \land E_i$ and a reachability predicate $c(i)$. Implemented in the Comunica engine, the approach demonstrates substantial reductions in query execution time and HTTP requests compared to a predicate-only strategy, while preserving completeness in preliminary tests. The results highlight that the internal triple-store size can significantly impact performance and point to future work on broader fragmentation strategies and more expressive online reasoning for source selection during traversal. This work suggests a practical path toward faster, more scalable traversal-based querying over unindexed, decentralized RDF data.
Abstract
Link Traversal queries face challenges in completeness and long execution time due to the size of the web. Reachability criteria define completeness by restricting the links followed by engines. However, the number of links to dereference remains the bottleneck of the approach. Web environments often have structures exploitable by query engines to prune irrelevant sources. Current criteria rely on using information from the query definition and predefined predicate. However, it is difficult to use them to traverse environments where logical expressions indicate the location of resources. We propose to use a rule-based reachability criterion that captures logical statements expressed in hypermedia descriptions within linked data documents to prune irrelevant sources. In this poster paper, we show how the Comunica link traversal engine is modified to take hints from a hypermedia control vocabulary, to prune irrelevant sources. Our preliminary findings show that by using this strategy, the query engine can significantly reduce the number of HTTP requests and the query execution time without sacrificing the completeness of results. Our work shows that the investigation of hypermedia controls in link pruning of traversal queries is a worthy effort for optimizing web queries of unindexed decentralized databases.
