The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes
Alexander Tiskin
TL;DR
This work addresses the Chvátal--Sankoff problem, which asks for the asymptotic constant $\gamma$ governing the normalized expected LCS length of two independent random binary strings. The authors formulate a stochastic network evolution framework (CS) that mirrors LCS combinatorics, and they connect it to a more flexible class of models, including a Bernoulli-based Model B and an extended Markov/AM2 framework (Model M). By deriving a system of polynomial equations that encode stationarity, duality, and local fitting constraints, they show that $\gamma$ is algebraic and obtain a substantial refinement of prior estimates for CS via exact-radical solutions of these equations. The approach blends combinatorial structures (LCS grid, sticky braids, transposition networks) with scaling limits and PDE-inspired descriptions to yield a deeper, more precise handle on the Chvátal--Sankoff constant, while highlighting the potential for further generalizations and numerical validation.
Abstract
Given two equally long, uniformly random binary strings, the expected length of their longest common subsequence (LCS) is asymptotically proportional to the strings' length. Finding the proportionality coefficient $γ$, i.e. the limit of the normalised LCS length for two random binary strings of length $n \to \infty$, is a very natural problem, first posed by Chvátal and Sankoff in 1975, and as yet unresolved. This problem has relevance to diverse fields ranging from combinatorics and algorithm analysis to coding theory and computational biology. Using methods of statistical mechanics, as well as some existing results on the combinatorial structure of LCS, we link constant $γ$ to the parameters of a certain stochastic particle process, which we use to obtain a new estimate for $γ$.
