Expected Length of the Longest Common Subsequence of Multiple Strings
Ray Li, William Ren, Yiran Wen
TL;DR
This work analyzes the generalized Chvátal-Sankoff constant $\gamma_{k,d}$, the normalized expected length of the longest common subsequence for $d$ random strings over an alphabet of size $k$. It proves a tight asymptotic for the binary case: $\gamma_{2,d} = \tfrac{1}{2} + \Theta(\tfrac{1}{\sqrt{d}})$, using a greedy diagonal-LCS approach to obtain a strong lower bound and a Guruswami–Wang counting argument for the upper bound. For larger alphabets, it establishes near-optimal bounds when $d \ge \Omega(\log k)$, namely $\frac{1}{k}\bigl(1 + \frac{c_1}{\sqrt{d}}\bigr) \le \gamma_{k,d} \le \frac{1}{k}\bigl(1 + c_2\sqrt{\frac{\log k}{d}}\bigr)$, with reductions showing $\gamma_{k,d} \ge \frac{2}{k}\gamma_{2,d}$. The results connect LCS with list-decoding against deletions and provide rigorous asymptotics that extend the understanding of LCS beyond the classical two-string, binary setting, with implications for related coding-theoretic problems.
Abstract
We study the generalized Chvátal-Sankoff constant $γ_{k,d}$, which represents the normalized expected length of the longest common subsequence (LCS) of $d$ independent uniformly random strings over an alphabet of size $k$. We derive asymptotically tight bounds for $γ_{2,d}$, establishing that $γ_{2,d} = \frac{1}{2} + Θ\left(\frac{1}{\sqrt{d}}\right)$. We also derive asymptotically near-optimal bounds on $γ_{k,d}$ for $d\ge Ω(\log k)$.
