Table of Contents
Fetching ...

Not All RDF is Created Equal: Investigating RDF Load Times on Resource-Constrained Devices

Piotr Sowinski, Anh Le-Tuan, Pawel Szmeja, Maria Ganzha

TL;DR

The notion of relative loading speed (RLS) is introduced, allowing us to observe that the loading speed can differ between datasets by as much as a factor of 9.01, serving as clear evidence that ``not all RDF is created equal'' and stresses the importance of using multiple benchmark datasets in evaluations.

Abstract

As the role of knowledge-based systems in IoT keeps growing, ensuring resource efficiency of RDF stores becomes critical. However, up until now benchmarks of RDF stores were most often conducted with only one dataset, and the differences between the datasets were not explored in detail. In this paper, our objective is to close this research gap by experimentally evaluating the load times of eight diverse RDF datasets from the RiverBench benchmark suite. In the experiments, we use five different RDF store implementations and several resource-constrained hardware platforms. To analyze the results, we introduce the notion of relative loading speed (RLS), allowing us to observe that the loading speed can differ between datasets by as much as a factor of 9.01. This serves as clear evidence that ``not all RDF is created equal'' and stresses the importance of using multiple benchmark datasets in evaluations. We outline the possible reasons for this drastic difference, which should be further investigated in future work. To this end, we published the data, code, and the results of our experiments.

Not All RDF is Created Equal: Investigating RDF Load Times on Resource-Constrained Devices

TL;DR

The notion of relative loading speed (RLS) is introduced, allowing us to observe that the loading speed can differ between datasets by as much as a factor of 9.01, serving as clear evidence that ``not all RDF is created equal'' and stresses the importance of using multiple benchmark datasets in evaluations.

Abstract

As the role of knowledge-based systems in IoT keeps growing, ensuring resource efficiency of RDF stores becomes critical. However, up until now benchmarks of RDF stores were most often conducted with only one dataset, and the differences between the datasets were not explored in detail. In this paper, our objective is to close this research gap by experimentally evaluating the load times of eight diverse RDF datasets from the RiverBench benchmark suite. In the experiments, we use five different RDF store implementations and several resource-constrained hardware platforms. To analyze the results, we introduce the notion of relative loading speed (RLS), allowing us to observe that the loading speed can differ between datasets by as much as a factor of 9.01. This serves as clear evidence that ``not all RDF is created equal'' and stresses the importance of using multiple benchmark datasets in evaluations. We outline the possible reasons for this drastic difference, which should be further investigated in future work. To this end, we published the data, code, and the results of our experiments.

Paper Structure

This paper contains 13 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Loading speed comparison across different RDF stores and platforms. The results were averaged over all datasets. The Y axes are logarithmic. The data points were aggregated per 1 MT for clarity, and the shaded areas indicate the 95% confidence interval.
  • Figure 2: Comparison of dataset sizes to the number of triples that were successfully loaded by a given RDF store on a given hardware platform. If an orange dot (loaded triples) is visible, it indicates that the dataset was not loaded fully in this configuration. The X axes are in logarithmic scale.
  • Figure 3: Comparison of relative loading speed (in triples per second) across different datasets, averaged over all device types and RDF stores. Left: change of relative speed over time. The data points were aggregated per 500 kT for clarity, and the shaded areas indicate the 95% confidence interval. Right: distribution of relative speed for each dataset. The numerical labels indicate the median relative loading speed of a dataset.