Table of Contents
Fetching ...

Rooting Out Entropy: Optimal Tree Extraction for Ultra-Succinct Graphs

Ziad Ismaili Alaoui, Tamio-Vesa Nakajima, Namrata, Sebastian Wild

Abstract

We combine two methods for the lossless compression of unlabeled graphs - entropy compressing adjacency lists and computing canonical names for vertices - and solve an ensuing novel optimisation problem: Minimum-Entropy Tree-Extraction (MINETREX). MINETREX asks to determine a spanning forest $F$ to remove from a graph $G$ so that the remaining graph $G-F$ has minimal indegree entropy $H(d_1,\ldots,d_n) = \sum_{v\in V} d_v \log_2(m/d_v)$ among all choices for $F$. (Here $d_v$ is the indegree of vertex $v$ in $G-F$; $m$ is the number of edges.) We show that MINETREX is NP-hard to approximate with additive error better than $δn$ (for some constant $δ>0$), and provide a simple greedy algorithm that achieves additive error at most $n / \ln 2$. By storing the extracted spanning forest and the remaining edges separately, we obtain a degree-entropy compressed ("ultrasuccinct") data structure for representing an arbitrary (static) unlabeled graph that supports navigational graph queries in logarithmic time. It serves as a drop-in replacement for adjacency-list representations using substantially less space for most graphs; we precisely quantify these savings in terms of the maximal subgraph density. Our inapproximability result uses an approximate variant of the hitting set problem on biregular instances whose hardness proof is contained implicitly in a reduction by Guruswami and Trevisan (APPROX/RANDOM 2005); we consider the unearthing of this reduction partner of independent interest with further likely uses in hardness of approximation.

Rooting Out Entropy: Optimal Tree Extraction for Ultra-Succinct Graphs

Abstract

We combine two methods for the lossless compression of unlabeled graphs - entropy compressing adjacency lists and computing canonical names for vertices - and solve an ensuing novel optimisation problem: Minimum-Entropy Tree-Extraction (MINETREX). MINETREX asks to determine a spanning forest to remove from a graph so that the remaining graph has minimal indegree entropy among all choices for . (Here is the indegree of vertex in ; is the number of edges.) We show that MINETREX is NP-hard to approximate with additive error better than (for some constant ), and provide a simple greedy algorithm that achieves additive error at most . By storing the extracted spanning forest and the remaining edges separately, we obtain a degree-entropy compressed ("ultrasuccinct") data structure for representing an arbitrary (static) unlabeled graph that supports navigational graph queries in logarithmic time. It serves as a drop-in replacement for adjacency-list representations using substantially less space for most graphs; we precisely quantify these savings in terms of the maximal subgraph density. Our inapproximability result uses an approximate variant of the hitting set problem on biregular instances whose hardness proof is contained implicitly in a reduction by Guruswami and Trevisan (APPROX/RANDOM 2005); we consider the unearthing of this reduction partner of independent interest with further likely uses in hardness of approximation.
Paper Structure (35 sections, 31 theorems, 51 equations, 4 figures)

This paper contains 35 sections, 31 theorems, 51 equations, 4 figures.

Key Result

theorem 1

Fix some vector $(d_1, \ldots, d_n) \in \mathbb{N}^n$ with $m = \sum_i d_i$. Fix some positive integer $k$. Fix also some class Consider the following two objective functions: The following holds for any $(x_1, \ldots, x_n) \in \mathcal{C}$: Hence, if we can minimise $\mathop{\mathrm{\text{Opt-Lin}}}\nolimits$ over $\mathcal{C}$ exactly, then we can minimise $\mathop{\mathrm{\text{Opt-Ent}}}\no

Figures (4)

  • Figure 1: Example showing the significance of the tree in Minimum-Entropy Tree-Extraction. In the example graph (middle), tree extraction can, depending on the chosen tree (red dotted), either (left) yield the optimal space saving of $\sim n \lg n$ bits for the entropy-compressed representation of the remaining edges (black), or (right) negligible saving of $O(n)$ bits, which is comparable to its overhead to store the extracted tree. We point out that the directions of edges in the extracted tree can be arbitrary; tree extraction is not limited to deleting an arborescence.
  • Figure 2: Illustration of the ultrasuccinct TREX data structure on a directed graph $G$. The dashed edges form the spanning tree $T$ (computed via the MST-approximation algorithm from \ref{['sec:approximation']} ). The vertices are labelled as 'new:old' where the new labels are induced by the level-order traversal of rooted tree $T$. The array $D$ stores the direction of edges in $T$. The array $A'$ represents the adjacency list of $G$ after removing $T$ in the order of vertices appearing in level-order. The one-bits in $S'$ mark the start indices of the outneighbourhood of vertices in $A'$.
  • Figure 3: Example of a graph where tree extraction does not lower entropy. The clique on the left has $\sqrt n$ vertices in general.
  • Figure 4: Reduction of X3C to MINETREX (the left-hand-side bipartition represents the subcollections, and the right-hand-side bipartition, the universe set).

Theorems & Definitions (50)

  • theorem 1
  • corollary 1
  • proof
  • corollary 2
  • theorem 2
  • theorem 3
  • theorem 4
  • theorem 5
  • theorem 6: Ultrasuccinct TREX
  • lemma 1: Domination, see e. g., CardinalFioriniJoret2008
  • ...and 40 more