Table of Contents
Fetching ...

Constant-delay enumeration for SLP-compressed documents

Martín Muñoz, Cristian Riveros

TL;DR

The paper advances constant-delay enumeration for queries over SLP-compressed documents by introducing Annotated Automata (AnnA) and a Shift-ECS data structure that supports shifts in output positions. It proves that unambiguous AnnA can be evaluated with linear preprocessing on the SLP and output-linear delay, with preprocessing bounded by $O(| ext{AnnA}|^3 imes |S|)$. The Shift-ECS framework enables bottom-up evaluation and compact storage of outputs, leading to constant-delay enumeration in broader spanner contexts and enabling complex document editing while preserving delay guarantees. The work also shows reductions from VA/eVA to AnnA, generalizing constant-delay enumeration to regular spanners in compressed settings and extends the results to succinct representations and editing workflows, with a cohesive theoretical foundation and clear practical implications for streaming and compressed data querying.

Abstract

We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our queries, we use a model called Annotated Automata, an extension of regular automata that allows annotations on letters. This model extends the notion of Regular Spanners as it allows arbitrarily long outputs. Our main result is an algorithm that evaluates such a query by enumerating all results with output-linear delay after a preprocessing phase which takes linear time on the size of the SLP, and cubic time over the size of the automaton. This is an improvement over Schmid and Schweikardt's result, which, with the same preprocessing time, enumerates with a delay that is logarithmic on the size of the uncompressed document. We achieve this through a persistent data structure named Enumerable Compact Sets with Shifts which guarantees output-linear delay under certain restrictions. These results imply constant-delay enumeration algorithms in the context of regular spanners. Further, we use an extension of annotated automata which utilizes succinctly encoded annotations to save an exponential factor from previous results that dealt with constant-delay enumeration over vset automata. Lastly, we extend our results in the same fashion Schmid and Schweikardt did to allow complex document editing while maintaining the constant delay guarantee.

Constant-delay enumeration for SLP-compressed documents

TL;DR

The paper advances constant-delay enumeration for queries over SLP-compressed documents by introducing Annotated Automata (AnnA) and a Shift-ECS data structure that supports shifts in output positions. It proves that unambiguous AnnA can be evaluated with linear preprocessing on the SLP and output-linear delay, with preprocessing bounded by . The Shift-ECS framework enables bottom-up evaluation and compact storage of outputs, leading to constant-delay enumeration in broader spanner contexts and enabling complex document editing while preserving delay guarantees. The work also shows reductions from VA/eVA to AnnA, generalizing constant-delay enumeration to regular spanners in compressed settings and extends the results to succinct representations and editing workflows, with a cohesive theoretical foundation and clear practical implications for streaming and compressed data querying.

Abstract

We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our queries, we use a model called Annotated Automata, an extension of regular automata that allows annotations on letters. This model extends the notion of Regular Spanners as it allows arbitrarily long outputs. Our main result is an algorithm that evaluates such a query by enumerating all results with output-linear delay after a preprocessing phase which takes linear time on the size of the SLP, and cubic time over the size of the automaton. This is an improvement over Schmid and Schweikardt's result, which, with the same preprocessing time, enumerates with a delay that is logarithmic on the size of the uncompressed document. We achieve this through a persistent data structure named Enumerable Compact Sets with Shifts which guarantees output-linear delay under certain restrictions. These results imply constant-delay enumeration algorithms in the context of regular spanners. Further, we use an extension of annotated automata which utilizes succinctly encoded annotations to save an exponential factor from previous results that dealt with constant-delay enumeration over vset automata. Lastly, we extend our results in the same fashion Schmid and Schweikardt did to allow complex document editing while maintaining the constant delay guarantee.
Paper Structure (29 sections, 16 theorems, 23 equations, 5 figures)

This paper contains 29 sections, 16 theorems, 23 equations, 5 figures.

Key Result

Lemma 2.3

For every annotated automaton $\mathcal{A}$ there exists a deterministic annotated automaton $\mathcal{A}'$ such that ${\lsem{}{\mathcal{A}}\rsem}(d) = {\lsem{}{\mathcal{A}'}\rsem}(d)$ for every $d\in\Sigma^*$.

Figures (5)

  • Figure 1: Example of an annotated automaton.
  • Figure 2: An example of an Shift-ECS with output alphabet $\{x, y\}$ where $v_1$ is a product node, $v_2$ is a shift node, $v_3$ is a union node, and $v_4$ and $v_5$ are bottom nodes. We use dashed and solid edges for the left and right partial function, respectively.
  • Figure 3: Evolution of the stack $\operatorname{St}$ (written on the bottom and represented by dashed arrows) for an iterator over the node $v$ in the figure. The underlying ECS is made of union nodes, two $\mathbb{Z}$ nodes, and six bottom nodes. The first figure is $\operatorname{St}$ after calling $\operatorname{St} \gets {\sf push}(\operatorname{St}, (v,0))$, the second is after calling $\operatorname{St} \gets\textsc{Traverse}(\operatorname{St})$. The last two figures represent successive calls to ${\sf pop}(\operatorname{St}), \operatorname{St} \gets\textsc{Traverse}(\operatorname{St})$.
  • Figure 4: (a) Gadget for $\textsf{prod}(v_1,v_2)$. (b) Gadget for $\textsf{union}(v_3,v_4)$. We use dashed and solid edges for the left and right child, respectively. Node names are in grey at the left of each node. Nodes in square boxes are the input and output nodes of each operation.
  • Figure 5: Gadgets for $\textsf{prod}$ as defined for a Shift-ECS with the $\varepsilon$-node.

Theorems & Definitions (27)

  • Example 2.1
  • Example 2.2
  • Lemma 2.3
  • Theorem 2.4
  • Example 3.1
  • Proposition 3.2
  • proof
  • Proposition 4.1
  • Corollary 4.2
  • Theorem 4.3
  • ...and 17 more