Faster Space-Efficient STR-IC-LCS Computation
Yuki Yonemoto, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai
TL;DR
The paper addresses the STR-IC-LCS problem, seeking the longest common subsequence of $A$ and $B$ that contains a given pattern $P$ as a substring. It introduces three algorithms with progressively better time-space trade-offs: Algorithm I computes the length in $O(n^2)$ time using $O((\ell+1)(n-\ell+1))$ space, Algorithm II accelerates to $O(nr/\log r + n(n-\ell+1))$ time while maintaining the same space bound by pruning candidates, and Algorithm III further refines the approach using $\ell'$ with $O(nr/\log r + n(\ell'+1))$ time and $O((\ell'+1)(n-\ell'+1))$ space. All methods build on Nakatsu et al.'s sparse LCS framework and incorporate minimal intervals for $P$ (via Das et al. for Algorithm II) to achieve subquadratic memory and, in many cases, subquadratic time, while ensuring practical reconstruction of the STR-IC-LCS. The results advance constrained LCS computation, with implications for sequence analysis where pattern constraints are essential, and establish near-optimal time bounds under standard hardness assumptions like SETH.
Abstract
One of the most fundamental method for comparing two given strings $A$ and $B$ is the longest common subsequence (LCS), where the task is to find (the length) of an LCS of $A$ and $B$. In this paper, we deal with the STR-IC-LCS problem which is one of the constrained LCS problems proposed by Chen and Chao [J. Comb. Optim, 2011]. A string $Z$ is said to be an STR-IC-LCS of three given strings $A$, $B$, and $P$, if $Z$ is a longest string satisfying that (1) $Z$ includes $P$ as a substring and (2) $Z$ is a common subsequence of $A$ and $B$. We present three efficient algorithms for this problem: First, we begin with a space-efficient solution which computes the length of an STR-IC-LCS in $O(n^2)$ time and $O((\ell+1)(n-\ell+1))$ space, where $\ell$ is the length of an LCS of $A$ and $B$ of length $n$. When $\ell = O(1)$ or $n-\ell = O(1)$, then this algorithm uses only linear $O(n)$ space. Second, we present a faster algorithm that works in $O(nr/\log{r}+n(n-\ell+1))$ time, where $r$ is the length of $P$, while retaining the $O((\ell+1)(n-\ell+1))$ space efficiency. Third, we give an alternative algorithm that runs in $O(nr/\log{r}+n(n-\ell'+1))$ time with $O((\ell'+1)(n-\ell'+1))$ space, where $\ell'$ denotes the STR-IC-LCS length for input strings $A$, $B$, and $P$.
