Table of Contents
Fetching ...

All-Pairs Suffix-Prefix on Fully Dynamic Set of Strings

Masaru Kikuchi, Shunsuke Inenaga

TL;DR

This paper addresses the all-pairs suffix-prefix (APSP) problem under dynamic settings, presenting an $O(n)$-space data structure that, for each newly arriving string $S_i$, computes both $\mathcal{F}_i$ and $\mathcal{B}_i$ in $O(|S_i| \log \sigma + i)$ time. The approach leverages a DAWG-based dynamic framework to update and query overlaps efficiently, with a suffix-tree-based extension to handle deletions in a fully dynamic setting, achieving amortized $O(|S_i| \log \sigma + k)$ per update where $k$ is the current set size. A separate static-APSP algorithm based on AC-automata and a compact prefix-trie provides a simple, linear-space baseline that matches static-state performance up to a $\log \sigma$ factor. Together, these results yield near-optimal dynamic algorithms for APSP, applicable to genome assembly and other string-processing domains, and open avenues for extensions to dynamic hierarchical overlap graphs. The work demonstrates how combining DAWGs, AC-automata, and suffix trees enables efficient real-time maintenance of suffix-prefix relationships in growing and shrinking string collections.

Abstract

The all-pairs suffix-prefix (APSP) problem is a classical problem in string processing which has important applications in bioinformatics. Given a set $\mathcal{S} = \{S_1, \ldots, S_k\}$ of $k$ strings, the APSP problem asks one to compute the longest suffix of $S_i$ that is a prefix of $S_j$ for all $k^2$ ordered pairs $\langle S_i, S_j \rangle$ of strings in $\mathcal{S}$. In this paper, we consider the dynamic version of the APSP problem that allows for insertions of new strings to the set of strings. Our objective is, each time a new string $S_i$ arrives to the current set $\mathcal{S}_{i-1} = \{S_1, \ldots, S_{i-1}\}$ of $i-1$ strings, to compute (1) the longest suffix of $S_i$ that is a prefix of $S_j$ and (2) the longest prefix of $S_i$ that is a suffix of $S_j$ for all $1 \leq j \leq i$. We propose an $O(n)$-space data structure which computes (1) and (2) in $O(|S_i| \log σ+ i)$ time for each new given string $S_i$, where $n$ is the total length of the strings. Further, we show how to extend our methods to the fully dynamic version of the APSP problem allowing for both insertions and deletions of strings.

All-Pairs Suffix-Prefix on Fully Dynamic Set of Strings

TL;DR

This paper addresses the all-pairs suffix-prefix (APSP) problem under dynamic settings, presenting an -space data structure that, for each newly arriving string , computes both and in time. The approach leverages a DAWG-based dynamic framework to update and query overlaps efficiently, with a suffix-tree-based extension to handle deletions in a fully dynamic setting, achieving amortized per update where is the current set size. A separate static-APSP algorithm based on AC-automata and a compact prefix-trie provides a simple, linear-space baseline that matches static-state performance up to a factor. Together, these results yield near-optimal dynamic algorithms for APSP, applicable to genome assembly and other string-processing domains, and open avenues for extensions to dynamic hierarchical overlap graphs. The work demonstrates how combining DAWGs, AC-automata, and suffix trees enables efficient real-time maintenance of suffix-prefix relationships in growing and shrinking string collections.

Abstract

The all-pairs suffix-prefix (APSP) problem is a classical problem in string processing which has important applications in bioinformatics. Given a set of strings, the APSP problem asks one to compute the longest suffix of that is a prefix of for all ordered pairs of strings in . In this paper, we consider the dynamic version of the APSP problem that allows for insertions of new strings to the set of strings. Our objective is, each time a new string arrives to the current set of strings, to compute (1) the longest suffix of that is a prefix of and (2) the longest prefix of that is a suffix of for all . We propose an -space data structure which computes (1) and (2) in time for each new given string , where is the total length of the strings. Further, we show how to extend our methods to the fully dynamic version of the APSP problem allowing for both insertions and deletions of strings.
Paper Structure (20 sections, 11 theorems, 9 equations, 6 figures)

This paper contains 20 sections, 11 theorems, 9 equations, 6 figures.

Key Result

Theorem 1

For a set $\mathcal{S}$ of strings of total length $n$, $\mathsf{AC}(\mathcal{S})$ can be built

Figures (6)

  • Figure 1: Illustrations of $\mathsf{AC}(\mathcal{S})$ (left) and $\mathsf{ComTrie}(\mathcal{S})$ (right) for the set $\mathcal{S} = \{\mathrm{abaa, abac, abb, abcb, bab, babaa, bb, bbaa, bbba}\}$ of strings. The bold solid arcs represent trie edges and the dashed arcs represent failure links. The nodes representing the strings in $\mathcal{S}$ are depicted by double-lined circles with the string id's.
  • Figure 2: $\mathsf{DAWG}(\mathcal{S})$ for the same set $\mathcal{S} = \{\mathrm{abaa, abac, abb, abcb, bab, babaa, bb, bbaa, bbba}\}$ of strings as in Fig. \ref{['fig:AC']}. The induced tree consisting only of the double-lined arcs is $\mathsf{Trie}(\mathcal{S})$.
  • Figure 3: Illustration of the suffix links of $\mathsf{DAWG}(\mathcal{S})$ for the same set $\mathcal{S}$ of strings as in Fig \ref{['fig:DAWG']}.
  • Figure 4: $\mathsf{STree}(\mathcal{S})$ for the same set $\mathcal{S} = \{\mathrm{abaa, abac, abb, abcb, bab, babaa, bb, bbaa, bbba}\}$ of strings as in Fig. \ref{['fig:AC']}. The induced tree consisting only of the double-lined arcs is a compacted version of $\mathsf{Trie}(\mathcal{S})$.
  • Figure 5: Illustration of the suffix links of $\mathsf{STree}(\mathcal{S})$ for the same set $\mathcal{S}$ of strings as in Fig \ref{['fig:STree']}.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Theorem 1: DoriL06Aho1975StringMatching
  • Theorem 2: Blumer1987
  • Theorem 3: Weiner73Ukkonen95TakagiIABH20
  • Theorem 4
  • proof
  • Corollary 1
  • proof
  • Theorem 5
  • Corollary 2
  • Lemma 1
  • ...and 4 more