Improved Lower Bounds on the Expected Length of Longest Common Subsequences
George T. Heineman, Chase Miller, Daniel Reichman, Andrew Salls, Gábor Sárközy, Duncan Soiffer
TL;DR
This paper advances the understanding of the Chvátal-Sankoff constants $\gamma_{\sigma,d}$ by delivering a new, improved lower bound for the canonical binary-two-string case, $\gamma_{2,2}=0.792665992$, and extends gains to a broad range of $(\sigma,d)$. It builds on the Kiwi-Soto lower-bound framework, incorporating parallelization, a novel binary encoding, and a memory-efficient external-memory implementation to scale computations to large string lengths. The key contributions are a high-performance implementation, empirical state-of-the-art lower bounds across multiple parameters, and publicly available code to enable replication and further research. These improvements enhance the practical estimation of $\gamma_{\sigma,d}$ and open paths for tighter bounds and potential structural relations among constants in the LCS literature.
Abstract
It has been proven that, when normalized by $n$, the expected length of a longest common subsequence of $d$ random strings of length $n$ over an alphabet of size $σ$ converges to some constant that depends only on $d$ and $σ$. These values are known as the Chvátal-Sankoff constants, and determining their exact values is a well-known open problem. Upper and lower bounds are known for some combinations of $σ$ and $d$, with the best lower and upper bounds for the most studied case, $σ=2, d=2$, at $0.788071$ and $0.826280$, respectively. Building off previous algorithms for lower-bounding the constants, we implement runtime optimizations, parallelization, and an efficient memory reading and writing scheme to obtain an improved lower bound of $0.792665992$ for $σ=2, d=2$. We additionally improve upon almost all previously reported lower bounds for the Chvátal-Sankoff constants when either the size of alphabet, the number of strings, or both are larger than 2.
