ITR: Grammar-based Graph Compression Supporting Fast Triple Queries

Enno Adler; Stefan Böttcher; Rita Hartel

ITR: Grammar-based Graph Compression Supporting Fast Triple Queries

Enno Adler, Stefan Böttcher, Rita Hartel

TL;DR

A grammar-based compressor called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair outperforms the other graph compressors for all triple SPO queries except for the query-type ?

Abstract

Neighborhood queries and triple queries are the most common queries on graphs; thus, it is desirable to answer them efficiently on compressed data structures. We present a compression scheme called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair and apply the scheme to network, version, and RDF graphs. We show that ITR performs neighborhood queries and triple queries within only a few milliseconds and thereby outperforms existing RePair-based solutions on graphs while providing a compression size comparable to existing graph compressors.

ITR: Grammar-based Graph Compression Supporting Fast Triple Queries

TL;DR

Abstract

Paper Structure (5 sections, 4 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 5 sections, 4 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Preliminaries
Compression
Experimental Results
Summary and Conclusion

Figures (4)

Figure 1: Example hypergraph replacement grammar. From subfigure (a) to (b), we replace all occurrences of the digram $d=((g,~1), (f,~0))$ and introduce the rule $B$ shown in subfigure (d). In the digram $d$, $(g,~1)$ means an incoming edge with label $g$ and $(f,~0)$ means an outgoing edge with label $f$. The reverse step from (b) to (a) is called expanding and replacing an occurrence. From (b) to (c), we replace the loop of the edge $B(10, 10, 11)$ by introducing the rule $C$ shown in (e), which uses the rule $B$.
Figure 2: (a) illustrates the implementation of the succinct encoding. One edge in the start rule is represented by three parts: a column of the incidence matrix $M$, an edge label, and an ID of an index-function. (b) shows the IDs and the corresponding index-functions, and (c) shows how the index-function 2 stores the order and the repetitions of nodes $\zeta_{e_2}=[10, 11]$ of edge $e_2$.
Figure 3: The compression ratio is the file size of the compressed graph divided by the file size of the uncompressed input file. We stopped RDFRePair on wikidata after 6 days. For ITR+, only chess-legal and ttt-win use the same node label for multiple nodes.
Figure 4: Average runtime in milliseconds of 500 queries each on the jamendo graph. Implementations that do not appear in the legend do not support any type of query. gRePair can only answer neighborhood queries (S ? ? and ? ? O).

ITR: Grammar-based Graph Compression Supporting Fast Triple Queries

TL;DR

Abstract

ITR: Grammar-based Graph Compression Supporting Fast Triple Queries

Authors

TL;DR

Abstract

Table of Contents

Figures (4)