Table of Contents
Fetching ...

New Compressed Indices for Multijoins on Graph Databases

Diego Arroyuelo, Fabrizio Barisione, Antonio Fariña, Adrián Gómez-Brandón, Gonzalo Navarro

TL;DR

The paper tackles efficient evaluation of worst-case-optimal multijoins on graph databases by proposing compact index structures that approach raw data space while supporting LTJ in $ ilde{O}(Q^*)$ time. It introduces The Ring and RDFCSA, along with URing, to provide strong space-time tradeoffs, and combines them with adaptive variable elimination orders and refined intersection estimators. Empirical results on Wikidata show substantial speedups for obtaining the first 1000 results (up to 13x faster) and robust performance with larger savings in space, often surpassing traditional wco indices. The work highlights adaptive query planning and sophisticated cost estimators as key levers for improving practical performance, while noting open challenges around disk-based storage and dynamic updates.

Abstract

A recent surprising result in the implementation of worst-case-optimal (wco) multijoins in graph databases (specifically, basic graph patterns) is that they can be supported on graph representations that take even less space than a plain representation, and orders of magnitude less space than classical indices, while offering comparable performance. In this paper we uncover a wide set of new wco space-time tradeoffs: we (1) introduce new compact indices that handle multijoins in wco time, and (2) combine them with new query resolution strategies that offer better times in practice. As a result, we improve the average query times of current compact representations by a factor of up to 13 to produce the first 1000 results, and using twice their space, reduce their total average query time by a factor of 2. Our experiments suggest that there is more room for improvement in terms of generating better query plans for multijoins.

New Compressed Indices for Multijoins on Graph Databases

TL;DR

The paper tackles efficient evaluation of worst-case-optimal multijoins on graph databases by proposing compact index structures that approach raw data space while supporting LTJ in time. It introduces The Ring and RDFCSA, along with URing, to provide strong space-time tradeoffs, and combines them with adaptive variable elimination orders and refined intersection estimators. Empirical results on Wikidata show substantial speedups for obtaining the first 1000 results (up to 13x faster) and robust performance with larger savings in space, often surpassing traditional wco indices. The work highlights adaptive query planning and sophisticated cost estimators as key levers for improving practical performance, while noting open challenges around disk-based storage and dynamic updates.

Abstract

A recent surprising result in the implementation of worst-case-optimal (wco) multijoins in graph databases (specifically, basic graph patterns) is that they can be supported on graph representations that take even less space than a plain representation, and orders of magnitude less space than classical indices, while offering comparable performance. In this paper we uncover a wide set of new wco space-time tradeoffs: we (1) introduce new compact indices that handle multijoins in wco time, and (2) combine them with new query resolution strategies that offer better times in practice. As a result, we improve the average query times of current compact representations by a factor of up to 13 to produce the first 1000 results, and using twice their space, reduce their total average query time by a factor of 2. Our experiments suggest that there is more room for improvement in terms of generating better query plans for multijoins.
Paper Structure (35 sections, 5 equations, 8 figures, 4 tables)

This paper contains 35 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: A labeled graph $G'$ with its string to integer mapping and tries for orders pso and pos.
  • Figure 2: Example of the wavelet tree for the sequence $\{5,3,1,4,6,6,6,6,6,6,6,6,6\}$. The ranges at the left depict the alphabet range of each bitmap. The arrows show the procedure to obtain the $4$th value of the sequence.
  • Figure 3: The ring representation of the graph of Fig. \ref{['fig:graph_mapping_orders']}. The horizontal lines mark the values of $A_\textsc{s}$, $A_\textsc{o}$, $A_\textsc{p}$, left to right.
  • Figure 4: Structures involved in the construction of rdfcsa$^{\textsc{\sc spo}}$ (i.e., $D$ and $\Psi$) for the graph in Fig. \ref{['fig:graph_mapping_orders']}.
  • Figure 5: Index space and the averaged query times of the adaptive variants in msec, limiting outputs to 1000 results. Suffixes s and l mean small and large, respectively.
  • ...and 3 more figures