Table of Contents
Fetching ...

Faster Space-Efficient STR-IC-LCS Computation

Yuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai

TL;DR

The paper addresses the STR-IC-LCS problem, seeking the longest common subsequence of $A$ and $B$ that contains a given pattern $P$ as a substring. It introduces three algorithms with progressively better time-space trade-offs: Algorithm I computes the length in $O(n^2)$ time using $O((\ell+1)(n-\ell+1))$ space, Algorithm II accelerates to $O(nr/\log r + n(n-\ell+1))$ time while maintaining the same space bound by pruning candidates, and Algorithm III further refines the approach using $\ell'$ with $O(nr/\log r + n(\ell'+1))$ time and $O((\ell'+1)(n-\ell'+1))$ space. All methods build on Nakatsu et al.'s sparse LCS framework and incorporate minimal intervals for $P$ (via Das et al. for Algorithm II) to achieve subquadratic memory and, in many cases, subquadratic time, while ensuring practical reconstruction of the STR-IC-LCS. The results advance constrained LCS computation, with implications for sequence analysis where pattern constraints are essential, and establish near-optimal time bounds under standard hardness assumptions like SETH.

Abstract

One of the most fundamental method for comparing two given strings $A$ and $B$ is the longest common subsequence (LCS), where the task is to find (the length) of an LCS of $A$ and $B$. In this paper, we deal with the STR-IC-LCS problem which is one of the constrained LCS problems proposed by Chen and Chao [J. Comb. Optim, 2011]. A string $Z$ is said to be an STR-IC-LCS of three given strings $A$, $B$, and $P$, if $Z$ is a longest string satisfying that (1) $Z$ includes $P$ as a substring and (2) $Z$ is a common subsequence of $A$ and $B$. We present three efficient algorithms for this problem: First, we begin with a space-efficient solution which computes the length of an STR-IC-LCS in $O(n^2)$ time and $O((\ell+1)(n-\ell+1))$ space, where $\ell$ is the length of an LCS of $A$ and $B$ of length $n$. When $\ell = O(1)$ or $n-\ell = O(1)$, then this algorithm uses only linear $O(n)$ space. Second, we present a faster algorithm that works in $O(nr/\log{r}+n(n-\ell+1))$ time, where $r$ is the length of $P$, while retaining the $O((\ell+1)(n-\ell+1))$ space efficiency. Third, we give an alternative algorithm that runs in $O(nr/\log{r}+n(n-\ell'+1))$ time with $O((\ell'+1)(n-\ell'+1))$ space, where $\ell'$ denotes the STR-IC-LCS length for input strings $A$, $B$, and $P$.

Faster Space-Efficient STR-IC-LCS Computation

TL;DR

The paper addresses the STR-IC-LCS problem, seeking the longest common subsequence of and that contains a given pattern as a substring. It introduces three algorithms with progressively better time-space trade-offs: Algorithm I computes the length in time using space, Algorithm II accelerates to time while maintaining the same space bound by pruning candidates, and Algorithm III further refines the approach using with time and space. All methods build on Nakatsu et al.'s sparse LCS framework and incorporate minimal intervals for (via Das et al. for Algorithm II) to achieve subquadratic memory and, in many cases, subquadratic time, while ensuring practical reconstruction of the STR-IC-LCS. The results advance constrained LCS computation, with implications for sequence analysis where pattern constraints are essential, and establish near-optimal time bounds under standard hardness assumptions like SETH.

Abstract

One of the most fundamental method for comparing two given strings and is the longest common subsequence (LCS), where the task is to find (the length) of an LCS of and . In this paper, we deal with the STR-IC-LCS problem which is one of the constrained LCS problems proposed by Chen and Chao [J. Comb. Optim, 2011]. A string is said to be an STR-IC-LCS of three given strings , , and , if is a longest string satisfying that (1) includes as a substring and (2) is a common subsequence of and . We present three efficient algorithms for this problem: First, we begin with a space-efficient solution which computes the length of an STR-IC-LCS in time and space, where is the length of an LCS of and of length . When or , then this algorithm uses only linear space. Second, we present a faster algorithm that works in time, where is the length of , while retaining the space efficiency. Third, we give an alternative algorithm that runs in time with space, where denotes the STR-IC-LCS length for input strings , , and .
Paper Structure (13 sections, 9 theorems, 2 equations, 8 figures, 1 table, 3 algorithms)

This paper contains 13 sections, 9 theorems, 2 equations, 8 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

The STR-IC-LCS problem can be solved in $O(n^2)$ time and $O((\ell+1)(n-\ell+1))$ space where $\ell$ is the length of LCS of $A$ and $B$.

Figures (8)

  • Figure 1: Let $A = \mathtt{bcdababcb}$, $B = \mathtt{cbacbaaba}$, and $P = \mathtt{abb}$. The length of an STR-IC-LCS of these strings is 6. One of such strings can be obtained by minimal intervals $[4..7]$ over $A$ and $[6..8]$ over $B$ because $\mathsf{lcs}(\mathtt{bca},\mathtt{cbacb}) = 2$, $|P| = 3$, and $\mathsf{lcs}(\mathtt{cb},\mathtt{c}) = 1$.
  • Figure 2: The LCS-table $f_A$ which is defined by Nakatsu et al. of $A = \mathtt{bcdababcb}$. This figure also illustrates the table $f_B$ of $B = \mathtt{cbacbaaba}$.
  • Figure 3: A sparse table $F_A$ of $f_A$ for $A = \mathtt{bcdababcb}$ and $B = \mathtt{cbacbaaba}$ does not give $\mathsf{lcs}(A[1..i], B[1..j])$ for some $(i, j)$.
  • Figure 4: Due to Observation \ref{['obs:lcs']}, $f_A(3,7)$ gives the fact that $\mathsf{lcs}(A[1..7],B[1..4]) = 3$. However, $F_A(3,7) = \mathsf{undefined}$. Then we can obtain the fact that $\mathsf{lcs}(A[1..7],B[1..4]) = 3$ by using $F_B$. Namely, $F_B(3,4)$ gives the LCS value.
  • Figure 5: This figure shows an illustration for the proof of Lemma \ref{['lem:recover-lcs']} (and Lemma \ref{['lem:visible-lcs']}). The length $s$ of an LCS of $A[1..i]$ and $B[1..j]$ cannot be obtained over $F_A$ because $F_A(s,i) = \mathsf{undefined}$ (the highlighted cell). However, the length can be obtained by $F_B(s,j)$ over $F_B$. The existence of $F_B(s+m,j_{s+m})$ from an LCS path guarantees the fact that $F_B(s,j) \neq \mathsf{undefined}$.
  • ...and 3 more figures

Theorems & Definitions (16)

  • Theorem 1
  • Lemma 1
  • Lemma 2
  • proof
  • proof : Proof of Lemma \ref{['lem:recover-lcs']}
  • Lemma 3
  • proof
  • Theorem 2
  • Lemma 4
  • proof
  • ...and 6 more