Expected Length of the Longest Common Subsequence of Multiple Strings

Ray Li; William Ren; Yiran Wen

Expected Length of the Longest Common Subsequence of Multiple Strings

Ray Li, William Ren, Yiran Wen

TL;DR

This work analyzes the generalized Chvátal-Sankoff constant $\gamma_{k,d}$, the normalized expected length of the longest common subsequence for $d$ random strings over an alphabet of size $k$. It proves a tight asymptotic for the binary case: $\gamma_{2,d} = \tfrac{1}{2} + \Theta(\tfrac{1}{\sqrt{d}})$, using a greedy diagonal-LCS approach to obtain a strong lower bound and a Guruswami–Wang counting argument for the upper bound. For larger alphabets, it establishes near-optimal bounds when $d \ge \Omega(\log k)$, namely $\frac{1}{k}\bigl(1 + \frac{c_1}{\sqrt{d}}\bigr) \le \gamma_{k,d} \le \frac{1}{k}\bigl(1 + c_2\sqrt{\frac{\log k}{d}}\bigr)$, with reductions showing $\gamma_{k,d} \ge \frac{2}{k}\gamma_{2,d}$. The results connect LCS with list-decoding against deletions and provide rigorous asymptotics that extend the understanding of LCS beyond the classical two-string, binary setting, with implications for related coding-theoretic problems.

Abstract

We study the generalized Chvátal-Sankoff constant $γ_{k,d}$, which represents the normalized expected length of the longest common subsequence (LCS) of $d$ independent uniformly random strings over an alphabet of size $k$. We derive asymptotically tight bounds for $γ_{2,d}$, establishing that $γ_{2,d} = \frac{1}{2} + Θ\left(\frac{1}{\sqrt{d}}\right)$. We also derive asymptotically near-optimal bounds on $γ_{k,d}$ for $d\ge Ω(\log k)$.

Expected Length of the Longest Common Subsequence of Multiple Strings

TL;DR

This work analyzes the generalized Chvátal-Sankoff constant

, the normalized expected length of the longest common subsequence for

random strings over an alphabet of size

. It proves a tight asymptotic for the binary case:

, using a greedy diagonal-LCS approach to obtain a strong lower bound and a Guruswami–Wang counting argument for the upper bound. For larger alphabets, it establishes near-optimal bounds when

, namely

, with reductions showing

. The results connect LCS with list-decoding against deletions and provide rigorous asymptotics that extend the understanding of LCS beyond the classical two-string, binary setting, with implications for related coding-theoretic problems.

Abstract

We study the generalized Chvátal-Sankoff constant

, which represents the normalized expected length of the longest common subsequence (LCS) of

independent uniformly random strings over an alphabet of size

. We derive asymptotically tight bounds for

, establishing that

. We also derive asymptotically near-optimal bounds on

for

Expected Length of the Longest Common Subsequence of Multiple Strings

TL;DR

Abstract

Expected Length of the Longest Common Subsequence of Multiple Strings

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (16)