On the sensitivity of CDAWG-grammars
Hiroto Fujimaru, Shunsuke Inenaga
TL;DR
The paper addresses how sensitive the CDAWG-based grammar $\mathsf{G_{CDAWG}}(T)$ is to a single-character edit. By expressing the grammar size as $g(T)=e(T)-\mathrm{v}^{(1)}(T)$ and analyzing how both $e(T)$ and $\mathrm{v}^{(1)}(T)$ vary under edits, the authors bound the edge-change $\mathsf{e}(T')-\mathsf{e}(T)$ via a partition of maximal repeats and crossing occurrences, and bound the in-degree-one node change $\mathrm{v}^{(1)}(T')-\mathrm{v}^{(1)}(T)$ through a detailed analysis of how nodes in $\mathsf{CDAWG}(T)$ can gain or lose in-edges. Combining these bounds yields the main result: the additive sensitivity satisfies $\mathsf{AS}(\mathsf{g},n) \le 4\mathsf{e}(T) + 4$, i.e., the CDAWG-grammar size increases by at most $4e+4$ after a single-character edit. This demonstrates the robustness of the CDAWG-grammar under edits and provides a concrete worst-case bound in terms of the original CDAWG size, with implications for dynamic text compression and substring querying. The work leverages properties of maximal repeats, right-extensions, and the structure of the reversed CDAWG to derive tight combinatorial bounds.
Abstract
The compact directed acyclic word graphs (CDAWG) [Blumer et al. 1987] of a string is the minimal compact automaton that recognizes all the suffixes of the string. CDAWGs are known to be useful for various string tasks including text pattern searching, data compression, and pattern discovery. The CDAWG-grammar [Belazzougui & Cunial 2017] is a grammar-based text compression based on the CDAWG. In this paper, we prove that the CDAWG-grammar size $g$ can increase by at most an additive factor of $4e + 4$ than the original after any single-character edit operation is performed on the input string, where $e$ denotes the number of edges in the corresponding CDAWG before the edit.
