The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

Alexander Tiskin

The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

Alexander Tiskin

TL;DR

This work addresses the Chvátal--Sankoff problem, which asks for the asymptotic constant $\gamma$ governing the normalized expected LCS length of two independent random binary strings. The authors formulate a stochastic network evolution framework (CS) that mirrors LCS combinatorics, and they connect it to a more flexible class of models, including a Bernoulli-based Model B and an extended Markov/AM2 framework (Model M). By deriving a system of polynomial equations that encode stationarity, duality, and local fitting constraints, they show that $\gamma$ is algebraic and obtain a substantial refinement of prior estimates for CS via exact-radical solutions of these equations. The approach blends combinatorial structures (LCS grid, sticky braids, transposition networks) with scaling limits and PDE-inspired descriptions to yield a deeper, more precise handle on the Chvátal--Sankoff constant, while highlighting the potential for further generalizations and numerical validation.

Abstract

Given two equally long, uniformly random binary strings, the expected length of their longest common subsequence (LCS) is asymptotically proportional to the strings' length. Finding the proportionality coefficient $γ$, i.e. the limit of the normalised LCS length for two random binary strings of length $n \to \infty$, is a very natural problem, first posed by Chvátal and Sankoff in 1975, and as yet unresolved. This problem has relevance to diverse fields ranging from combinatorics and algorithm analysis to coding theory and computational biology. Using methods of statistical mechanics, as well as some existing results on the combinatorial structure of LCS, we link constant $γ$ to the parameters of a certain stochastic particle process, which we use to obtain a new estimate for $γ$.

The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

TL;DR

This work addresses the Chvátal--Sankoff problem, which asks for the asymptotic constant

governing the normalized expected LCS length of two independent random binary strings. The authors formulate a stochastic network evolution framework (CS) that mirrors LCS combinatorics, and they connect it to a more flexible class of models, including a Bernoulli-based Model B and an extended Markov/AM2 framework (Model M). By deriving a system of polynomial equations that encode stationarity, duality, and local fitting constraints, they show that

is algebraic and obtain a substantial refinement of prior estimates for CS via exact-radical solutions of these equations. The approach blends combinatorial structures (LCS grid, sticky braids, transposition networks) with scaling limits and PDE-inspired descriptions to yield a deeper, more precise handle on the Chvátal--Sankoff constant, while highlighting the potential for further generalizations and numerical validation.

Abstract

, i.e. the limit of the normalised LCS length for two random binary strings of length

, is a very natural problem, first posed by Chvátal and Sankoff in 1975, and as yet unresolved. This problem has relevance to diverse fields ranging from combinatorics and algorithm analysis to coding theory and computational biology. Using methods of statistical mechanics, as well as some existing results on the combinatorial structure of LCS, we link constant

to the parameters of a certain stochastic particle process, which we use to obtain a new estimate for

Paper Structure (35 sections, 4 theorems, 19 equations, 1 figure, 1 table)

This paper contains 35 sections, 4 theorems, 19 equations, 1 figure, 1 table.

Introduction
Acknowledgements
Related work
LCS combinatorics
Random LCS on permutation strings
Bounds and estimates for $\gamma$
Stochastic evolution models
Combinatorics of the LCS problem
LCS grid
Sticky braids
Transposition networks
Model $\mathit{CS}$
Cell dependencies
Evolution
Duality
...and 20 more sections

Key Result

proposition 1

In model $\mathit{CS}$, the types of any three distinct cells are mutually independent. The types of any three distinct cells within a $\mathord{\begin{tikzpicture}[baseline=(base),x=0.5em,y=-0.5em] \foreach \i/\j in {0/0,0/1,1/0,1/1} \draw (\i,\j) rectangle +(1,1); \path (current bounding box.cen

Figures (1)

Figure 1: LCS grid with a sticky braid (left), transposition network (centre), particle evolution model (right) for strings $a = \textsf{"IOOO"}$, $b = \textsf{"OIOO"}$

Theorems & Definitions (10)

proposition 1
proof
proposition 2
proof
definition 1
definition 2
proposition 3
definition 3
theorem 1
proof

The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

TL;DR

Abstract

The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (10)