Constant sensitivity on the CDAWGs
Rikuya Hamai, Hiroto Fujimaru, Shunsuke Inenaga
TL;DR
The paper investigates how the size of a Constant Directed Acyclic Word Graph (CDAWG) changes when a single character is edited at an arbitrary position in the input string $T$. The authors develop a purely combinatorial, end-to-end analysis of maximal repeats and their right-extensions around the edited position, partitioning new and existing repeats into sets and bounding their contributions to the CDAWG edge count. They prove a tight constant-factor bound: the post-edit CDAWG size $\mathsf{e}(T')$ satisfies $\mathsf{e}(T') \le (8\mathsf{e}(T)+4)$, i.e., a multiplicative factor of at most $8$ in the limit as the original size $\mathsf{e}(T)$ grows. This establishes that CDAWGs have $O(1)$ multiplicative sensitivity to single-character edits, making them robust against edits and errors; the known lower bound is $2$, leaving a gap for future refinement.
Abstract
Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string $T$ is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string $T$, and thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation is performed at an arbitrary position in $T$. We show that the size of the CDAWG after an edit operation on $T$ is asymptotically at most 8 times larger than the original CDAWG before the edit.
