Table of Contents
Fetching ...

BARQ: A Vectorized SPARQL Query Execution Engine

Simon Grätzer, Lars Heling, Pavel Klinov

TL;DR

BARQ presents a batch-based, vectorized SPARQL execution engine for Stardog to accelerate CPU-bound queries while coexisting with the legacy Volcano-style engine. It implements a vectorized merge join, vectorized streaming aggregation, and adaptive batch sizing, enabling efficient batch processing and reduced per-tuple overhead. The authors document an incremental integration strategy, including batch-to-row adapters and selective operator replacements, and show substantial improvements on CPU-bound workloads with competitive IO-bound performance. The work provides actionable lessons for small-to-medium teams deploying substantial architectural changes in mature systems and outlines a practical path to broader vectorization in graph data processing pipelines.

Abstract

Stardog is a commercial Knowledge Graph platform built on top of an RDF graph database whose primary means of communication is a standardized graph query language called SPARQL. This paper describes our journey of developing a more performant query execution layer and plugging it into Stardog's query engine. The new executor, called BARQ, is based on the known principle of processing batches of tuples at a time in most critical query operators, particularly joins. In addition to presenting BARQ, the paper describes the challenges of integrating it into a mature, tightly integrated system based on the classical tuple-at-a-time Volcano model. It offers a gradual approach to overcoming the challenges that small- to medium-size engineering teams typically face. Finally, the paper presents experimental results showing that BARQ makes Stardog substantially faster on CPU-bound queries without sacrificing performance on disk-bound and OLTP-style queries.

BARQ: A Vectorized SPARQL Query Execution Engine

TL;DR

BARQ presents a batch-based, vectorized SPARQL execution engine for Stardog to accelerate CPU-bound queries while coexisting with the legacy Volcano-style engine. It implements a vectorized merge join, vectorized streaming aggregation, and adaptive batch sizing, enabling efficient batch processing and reduced per-tuple overhead. The authors document an incremental integration strategy, including batch-to-row adapters and selective operator replacements, and show substantial improvements on CPU-bound workloads with competitive IO-bound performance. The work provides actionable lessons for small-to-medium teams deploying substantial architectural changes in mature systems and outlines a practical path to broader vectorization in graph data processing pipelines.

Abstract

Stardog is a commercial Knowledge Graph platform built on top of an RDF graph database whose primary means of communication is a standardized graph query language called SPARQL. This paper describes our journey of developing a more performant query execution layer and plugging it into Stardog's query engine. The new executor, called BARQ, is based on the known principle of processing batches of tuples at a time in most critical query operators, particularly joins. In addition to presenting BARQ, the paper describes the challenges of integrating it into a mature, tightly integrated system based on the classical tuple-at-a-time Volcano model. It offers a gradual approach to overcoming the challenges that small- to medium-size engineering teams typically face. Finally, the paper presents experimental results showing that BARQ makes Stardog substantially faster on CPU-bound queries without sacrificing performance on disk-bound and OLTP-style queries.

Paper Structure

This paper contains 21 sections, 9 figures.

Figures (9)

  • Figure 1: Motivating Example
  • Figure 2: Stardog Architecture Overview
  • Figure 3: A column batch in BARQ
  • Figure 4: Merge Join: Illustrative example of input ranges and the resulting materialized join output.
  • Figure 5: Legacy row-based evaluation
  • ...and 4 more figures