Enumeration for MSO-Queries on Compressed Trees
Markus Lohrey, Markus L. Schmid
TL;DR
The paper tackles MSO-query enumeration on forests compressed by SLP-based grammars, proving that all MSO-enumeration tasks can be performed with linear preprocessing in the compressed size and output-linear delay. It achieves this by reducing MSO enumeration to path enumeration on decorated DAGs and then extending Bagan’s binary-tree enumeration algorithm to handle DAG-foldings, all while operating directly on compressed representations. A key technical advance is a constant-delay (or output-linear) path-enumeration routine for decorated DAGs, which underpins the main result and enables updates via relabelling with logarithmic-time efficiency in the uncompressed data. The work unifies forest algebras, SLP theory, and MSO-automata to enable practical, scalable query evaluation on large, compressed tree-structured data, and it lays a groundwork for further extensions to more update types and graph-grammar models.
Abstract
We study the problem of enumerating the answers to a query formulated in monadic second order logic (MSO) over an unranked forest F that is compressed by a straight-line program (SLP) D. Our main result states that this can be done after O(|D|) preprocessing and with output-linear delay (in data complexity). This is a substantial improvement over the previously known algorithms for MSO-evaluation over trees, since the compressed size |D| might be much smaller than (or even logarithmic in) the actual data size |F|, and there are linear time SLP-compressors that yield very good compressions on practical inputs. In particular, this also constitutes a meta-theorem in the field of algorithmics on SLP-compressed inputs: all enumeration problems on trees or strings that can be formulated in MSO-logic can be solved with linear preprocessing and output-linear delay, even if the inputs are compressed by SLPs. We also show that our approach can support vertex relabelling updates in time that is logarithmic in the uncompressed data. Our result extends previous work on the enumeration of MSO-queries over uncompressed trees and on the enumeration of document spanners over compressed text documents.
