BARQ: A Vectorized SPARQL Query Execution Engine
Simon Grätzer, Lars Heling, Pavel Klinov
TL;DR
BARQ presents a batch-based, vectorized SPARQL execution engine for Stardog to accelerate CPU-bound queries while coexisting with the legacy Volcano-style engine. It implements a vectorized merge join, vectorized streaming aggregation, and adaptive batch sizing, enabling efficient batch processing and reduced per-tuple overhead. The authors document an incremental integration strategy, including batch-to-row adapters and selective operator replacements, and show substantial improvements on CPU-bound workloads with competitive IO-bound performance. The work provides actionable lessons for small-to-medium teams deploying substantial architectural changes in mature systems and outlines a practical path to broader vectorization in graph data processing pipelines.
Abstract
Stardog is a commercial Knowledge Graph platform built on top of an RDF graph database whose primary means of communication is a standardized graph query language called SPARQL. This paper describes our journey of developing a more performant query execution layer and plugging it into Stardog's query engine. The new executor, called BARQ, is based on the known principle of processing batches of tuples at a time in most critical query operators, particularly joins. In addition to presenting BARQ, the paper describes the challenges of integrating it into a mature, tightly integrated system based on the classical tuple-at-a-time Volcano model. It offers a gradual approach to overcoming the challenges that small- to medium-size engineering teams typically face. Finally, the paper presents experimental results showing that BARQ makes Stardog substantially faster on CPU-bound queries without sacrificing performance on disk-bound and OLTP-style queries.
