Table of Contents
Fetching ...

The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

Alexander Tiskin

TL;DR

This work addresses the Chvátal--Sankoff problem, which asks for the asymptotic constant $\gamma$ governing the normalized expected LCS length of two independent random binary strings. The authors formulate a stochastic network evolution framework (CS) that mirrors LCS combinatorics, and they connect it to a more flexible class of models, including a Bernoulli-based Model B and an extended Markov/AM2 framework (Model M). By deriving a system of polynomial equations that encode stationarity, duality, and local fitting constraints, they show that $\gamma$ is algebraic and obtain a substantial refinement of prior estimates for CS via exact-radical solutions of these equations. The approach blends combinatorial structures (LCS grid, sticky braids, transposition networks) with scaling limits and PDE-inspired descriptions to yield a deeper, more precise handle on the Chvátal--Sankoff constant, while highlighting the potential for further generalizations and numerical validation.

Abstract

Given two equally long, uniformly random binary strings, the expected length of their longest common subsequence (LCS) is asymptotically proportional to the strings' length. Finding the proportionality coefficient $γ$, i.e. the limit of the normalised LCS length for two random binary strings of length $n \to \infty$, is a very natural problem, first posed by Chvátal and Sankoff in 1975, and as yet unresolved. This problem has relevance to diverse fields ranging from combinatorics and algorithm analysis to coding theory and computational biology. Using methods of statistical mechanics, as well as some existing results on the combinatorial structure of LCS, we link constant $γ$ to the parameters of a certain stochastic particle process, which we use to obtain a new estimate for $γ$.

The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

TL;DR

This work addresses the Chvátal--Sankoff problem, which asks for the asymptotic constant governing the normalized expected LCS length of two independent random binary strings. The authors formulate a stochastic network evolution framework (CS) that mirrors LCS combinatorics, and they connect it to a more flexible class of models, including a Bernoulli-based Model B and an extended Markov/AM2 framework (Model M). By deriving a system of polynomial equations that encode stationarity, duality, and local fitting constraints, they show that is algebraic and obtain a substantial refinement of prior estimates for CS via exact-radical solutions of these equations. The approach blends combinatorial structures (LCS grid, sticky braids, transposition networks) with scaling limits and PDE-inspired descriptions to yield a deeper, more precise handle on the Chvátal--Sankoff constant, while highlighting the potential for further generalizations and numerical validation.

Abstract

Given two equally long, uniformly random binary strings, the expected length of their longest common subsequence (LCS) is asymptotically proportional to the strings' length. Finding the proportionality coefficient , i.e. the limit of the normalised LCS length for two random binary strings of length , is a very natural problem, first posed by Chvátal and Sankoff in 1975, and as yet unresolved. This problem has relevance to diverse fields ranging from combinatorics and algorithm analysis to coding theory and computational biology. Using methods of statistical mechanics, as well as some existing results on the combinatorial structure of LCS, we link constant to the parameters of a certain stochastic particle process, which we use to obtain a new estimate for .
Paper Structure (35 sections, 4 theorems, 19 equations, 1 figure, 1 table)

This paper contains 35 sections, 4 theorems, 19 equations, 1 figure, 1 table.

Key Result

proposition 1

In model $\mathit{CS}$, the types of any three distinct cells are mutually independent. The types of any three distinct cells within a $\mathord{\begin{tikzpicture}[baseline=(base),x=0.5em,y=-0.5em] \foreach \i/\j in {0/0,0/1,1/0,1/1} \draw (\i,\j) rectangle +(1,1); \path (current bounding box.cen

Figures (1)

  • Figure 1: LCS grid with a sticky braid (left), transposition network (centre), particle evolution model (right) for strings $a = \textsf{"IOOO"}$, $b = \textsf{"OIOO"}$

Theorems & Definitions (10)

  • proposition 1
  • proof
  • proposition 2
  • proof
  • definition 1
  • definition 2
  • proposition 3
  • definition 3
  • theorem 1
  • proof