All-Pairs Suffix-Prefix on Fully Dynamic Set of Strings
Masaru Kikuchi, Shunsuke Inenaga
TL;DR
This paper addresses the all-pairs suffix-prefix (APSP) problem under dynamic settings, presenting an $O(n)$-space data structure that, for each newly arriving string $S_i$, computes both $\mathcal{F}_i$ and $\mathcal{B}_i$ in $O(|S_i| \log \sigma + i)$ time. The approach leverages a DAWG-based dynamic framework to update and query overlaps efficiently, with a suffix-tree-based extension to handle deletions in a fully dynamic setting, achieving amortized $O(|S_i| \log \sigma + k)$ per update where $k$ is the current set size. A separate static-APSP algorithm based on AC-automata and a compact prefix-trie provides a simple, linear-space baseline that matches static-state performance up to a $\log \sigma$ factor. Together, these results yield near-optimal dynamic algorithms for APSP, applicable to genome assembly and other string-processing domains, and open avenues for extensions to dynamic hierarchical overlap graphs. The work demonstrates how combining DAWGs, AC-automata, and suffix trees enables efficient real-time maintenance of suffix-prefix relationships in growing and shrinking string collections.
Abstract
The all-pairs suffix-prefix (APSP) problem is a classical problem in string processing which has important applications in bioinformatics. Given a set $\mathcal{S} = \{S_1, \ldots, S_k\}$ of $k$ strings, the APSP problem asks one to compute the longest suffix of $S_i$ that is a prefix of $S_j$ for all $k^2$ ordered pairs $\langle S_i, S_j \rangle$ of strings in $\mathcal{S}$. In this paper, we consider the dynamic version of the APSP problem that allows for insertions of new strings to the set of strings. Our objective is, each time a new string $S_i$ arrives to the current set $\mathcal{S}_{i-1} = \{S_1, \ldots, S_{i-1}\}$ of $i-1$ strings, to compute (1) the longest suffix of $S_i$ that is a prefix of $S_j$ and (2) the longest prefix of $S_i$ that is a suffix of $S_j$ for all $1 \leq j \leq i$. We propose an $O(n)$-space data structure which computes (1) and (2) in $O(|S_i| \log σ+ i)$ time for each new given string $S_i$, where $n$ is the total length of the strings. Further, we show how to extend our methods to the fully dynamic version of the APSP problem allowing for both insertions and deletions of strings.
