Table of Contents
Fetching ...

Enumeration for MSO-Queries on Compressed Trees

Markus Lohrey, Markus L. Schmid

TL;DR

The paper tackles MSO-query enumeration on forests compressed by SLP-based grammars, proving that all MSO-enumeration tasks can be performed with linear preprocessing in the compressed size and output-linear delay. It achieves this by reducing MSO enumeration to path enumeration on decorated DAGs and then extending Bagan’s binary-tree enumeration algorithm to handle DAG-foldings, all while operating directly on compressed representations. A key technical advance is a constant-delay (or output-linear) path-enumeration routine for decorated DAGs, which underpins the main result and enables updates via relabelling with logarithmic-time efficiency in the uncompressed data. The work unifies forest algebras, SLP theory, and MSO-automata to enable practical, scalable query evaluation on large, compressed tree-structured data, and it lays a groundwork for further extensions to more update types and graph-grammar models.

Abstract

We study the problem of enumerating the answers to a query formulated in monadic second order logic (MSO) over an unranked forest F that is compressed by a straight-line program (SLP) D. Our main result states that this can be done after O(|D|) preprocessing and with output-linear delay (in data complexity). This is a substantial improvement over the previously known algorithms for MSO-evaluation over trees, since the compressed size |D| might be much smaller than (or even logarithmic in) the actual data size |F|, and there are linear time SLP-compressors that yield very good compressions on practical inputs. In particular, this also constitutes a meta-theorem in the field of algorithmics on SLP-compressed inputs: all enumeration problems on trees or strings that can be formulated in MSO-logic can be solved with linear preprocessing and output-linear delay, even if the inputs are compressed by SLPs. We also show that our approach can support vertex relabelling updates in time that is logarithmic in the uncompressed data. Our result extends previous work on the enumeration of MSO-queries over uncompressed trees and on the enumeration of document spanners over compressed text documents.

Enumeration for MSO-Queries on Compressed Trees

TL;DR

The paper tackles MSO-query enumeration on forests compressed by SLP-based grammars, proving that all MSO-enumeration tasks can be performed with linear preprocessing in the compressed size and output-linear delay. It achieves this by reducing MSO enumeration to path enumeration on decorated DAGs and then extending Bagan’s binary-tree enumeration algorithm to handle DAG-foldings, all while operating directly on compressed representations. A key technical advance is a constant-delay (or output-linear) path-enumeration routine for decorated DAGs, which underpins the main result and enables updates via relabelling with logarithmic-time efficiency in the uncompressed data. The work unifies forest algebras, SLP theory, and MSO-automata to enable practical, scalable query evaluation on large, compressed tree-structured data, and it lays a groundwork for further extensions to more update types and graph-grammar models.

Abstract

We study the problem of enumerating the answers to a query formulated in monadic second order logic (MSO) over an unranked forest F that is compressed by a straight-line program (SLP) D. Our main result states that this can be done after O(|D|) preprocessing and with output-linear delay (in data complexity). This is a substantial improvement over the previously known algorithms for MSO-evaluation over trees, since the compressed size |D| might be much smaller than (or even logarithmic in) the actual data size |F|, and there are linear time SLP-compressors that yield very good compressions on practical inputs. In particular, this also constitutes a meta-theorem in the field of algorithmics on SLP-compressed inputs: all enumeration problems on trees or strings that can be formulated in MSO-logic can be solved with linear preprocessing and output-linear delay, even if the inputs are compressed by SLPs. We also show that our approach can support vertex relabelling updates in time that is logarithmic in the uncompressed data. Our result extends previous work on the enumeration of MSO-queries over uncompressed trees and on the enumeration of document spanners over compressed text documents.
Paper Structure (52 sections, 21 theorems, 22 equations, 10 figures, 1 algorithm)

This paper contains 52 sections, 21 theorems, 22 equations, 10 figures, 1 algorithm.

Key Result

Theorem 1.1

Fix an MSO-query $\Psi(X_1, X_2, \ldots, X_k)$. For an unranked forest $F$ that is given in compressed form by a forest SLP $\mathop{\mathrm{\mathcal{F}}}\nolimits$, one can enumerate $\Psi[F]$ after linear preprocessing $\mathop{\mathrm{\mathcal{O}}}\nolimits(|\mathop{\mathrm{\mathcal{F}}}\nolimits

Figures (10)

  • Figure 1: A forest, where every vertex is additionally labelled with its preorder number.
  • Figure 2: A possible input DAG for Theorem \ref{['thm-enumerate-paths']} after the preprocessing, i.e., the DAG is binary, it has no vertices of outdegree $1$ and the set $V_0$ is the set $\{11, 12, 13\}$ of leaves. Note that the edge decorations are integers labelling the edges.
  • Figure 3: A binary tree $T$ with labels from $\Sigma_0 = \{c,d\}$ and $\Sigma_2 = \{a,b\}$ (left side) and its DAG-folding (right side) with edge labels $\ell$ and $r$ indicating left and right edges. The distinct names of the vertices are omitted for readability, i.e., we only show the labels. Observe that $T$ has the following $6$ distinct subtrees: $c, d, a(cc), a(cd), b(a(cc)a(cd)), a(b(a(cc)a(cd))a(cd))$; thus, its DAG-folding has $6$ vertices.
  • Figure 4: A syntax tree over $(\Sigma^*, \mathbin{\varominus}, \mathop{\mathrm{\varepsilon}}\nolimits)$ (left side) and the corresponding s-SLP (right side). The labels $\ell$ and $r$ for left and right edges are implicitly represented by the drawing of the s-SLP (i.e., the left edge of a vertex is always drawn to the left of the right edge).
  • Figure 5: A forest algebra expression.
  • ...and 5 more figures

Theorems & Definitions (29)

  • Theorem 1.1
  • Theorem 3.1
  • Lemma 4.1
  • Example 4.2
  • Theorem 5.1: cf. CarmeNT04
  • Theorem 6.1
  • Theorem 6.2: cf. MMN22
  • Theorem 6.3
  • Theorem 6.4: Bagan Bagan06
  • Definition 6.5
  • ...and 19 more