Tight bounds for the sensitivity of CDAWGs with left-end edits
Hiroto Fujimaru, Yuto Nakashima, Shunsuke Inenaga
TL;DR
This work investigates how the size of a CDAWG changes when a single character edit is applied at the left end of the input string. It develops tight additive bounds on the increase in the number of edges, $e' - e$, for left-end insertions, deletions, and substitutions, with the strongest results for insertions: $AS_{ ext{LeftIns}}(\mathsf{e},n) \le \mathsf{e}$ and a matching lower bound $AS_{ ext{LeftIns}}(\mathsf{e},n) \ge \mathsf{e}-1$, plus near-tight bounds for the other edits. The paper also extends these insights to leftward online construction, proving a quadratic-time lower bound $\Omega(n^2)$ for updating the CDAWG as the string is prepended leftward, both in the plain online and batched settings. Overall, the results establish robust additive sensitivity bounds for CDAWGs under left-end edits and reveal fundamental limits on leftward maintenance algorithms. These findings have implications for CDAWG-based indexing and compression in the presence of edits and errors, and motivate extending the sensitivity framework to arbitrary-position edits and CDAWG-grammar sizes.
Abstract
Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string $T$ is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string $T$, thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation (insertion, deletion, or substitution) is performed at the left-end of the input string $T$, namely, we are interested in the worst-case increase in the size of the CDAWG after a left-end edit operation. We prove that if $e$ is the number of edges of the CDAWG for string $T$, then the number of new edges added to the CDAWG after a left-end edit operation on $T$ does not exceed $e$. Further, we present a matching lower bound on the sensitivity of CDAWGs for left-end insertions, and almost matching lower bounds for left-end deletions and substitutions. We then generalize our lower-bound instance for left-end insertions to leftward online construction of the CDAWG, and show that it requires $Ω(n^2)$ time for some string of length $n$.
