Table of Contents
Fetching ...

ITR: Grammar-based Graph Compression Supporting Fast Triple Queries

Enno Adler, Stefan Böttcher, Rita Hartel

TL;DR

A grammar-based compressor called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair outperforms the other graph compressors for all triple SPO queries except for the query-type ?

Abstract

Neighborhood queries and triple queries are the most common queries on graphs; thus, it is desirable to answer them efficiently on compressed data structures. We present a compression scheme called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair and apply the scheme to network, version, and RDF graphs. We show that ITR performs neighborhood queries and triple queries within only a few milliseconds and thereby outperforms existing RePair-based solutions on graphs while providing a compression size comparable to existing graph compressors.

ITR: Grammar-based Graph Compression Supporting Fast Triple Queries

TL;DR

A grammar-based compressor called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair outperforms the other graph compressors for all triple SPO queries except for the query-type ?

Abstract

Neighborhood queries and triple queries are the most common queries on graphs; thus, it is desirable to answer them efficiently on compressed data structures. We present a compression scheme called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair and apply the scheme to network, version, and RDF graphs. We show that ITR performs neighborhood queries and triple queries within only a few milliseconds and thereby outperforms existing RePair-based solutions on graphs while providing a compression size comparable to existing graph compressors.
Paper Structure (5 sections, 4 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 5 sections, 4 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Example hypergraph replacement grammar. From subfigure (a) to (b), we replace all occurrences of the digram $d=((g,~1), (f,~0))$ and introduce the rule $B$ shown in subfigure (d). In the digram $d$, $(g,~1)$ means an incoming edge with label $g$ and $(f,~0)$ means an outgoing edge with label $f$. The reverse step from (b) to (a) is called expanding and replacing an occurrence. From (b) to (c), we replace the loop of the edge $B(10, 10, 11)$ by introducing the rule $C$ shown in (e), which uses the rule $B$.
  • Figure 2: (a) illustrates the implementation of the succinct encoding. One edge in the start rule is represented by three parts: a column of the incidence matrix $M$, an edge label, and an ID of an index-function. (b) shows the IDs and the corresponding index-functions, and (c) shows how the index-function 2 stores the order and the repetitions of nodes $\zeta_{e_2}=[10, 11]$ of edge $e_2$.
  • Figure 3: The compression ratio is the file size of the compressed graph divided by the file size of the uncompressed input file. We stopped RDFRePair on wikidata after 6 days. For ITR+, only chess-legal and ttt-win use the same node label for multiple nodes.
  • Figure 4: Average runtime in milliseconds of 500 queries each on the jamendo graph. Implementations that do not appear in the legend do not support any type of query. gRePair can only answer neighborhood queries (S ? ? and ? ? O).