Thinned COE random matrix models for DNA replication
Huw Day, Nina C. Snaith
TL;DR
The paper investigates whether replication-origin spacings across diverse organisms can be modeled by random-matrix statistics, focusing on the Circular Orthogonal Ensemble (COE). It compares empirical spacings to the COE expectation using Wigner's surmise and then uses thinned COE ensembles, parameterized by a thinning factor $p$, to interpolate toward Poisson statistics; RMSE on cumulative distributions guides the fit. Results show that some model organisms, notably certain yeasts, are well described by thinned COE with modest thinning, while more complex organisms such as humans exhibit large outliers not captured by these models. The work suggests that thinning COE provides a flexible framework for capturing replication-origin spacing in some species and highlights the need for alternative mechanisms to explain spacing in more complex genomes.
Abstract
This paper details an observation that for more primitive organisms, such as some yeasts, the statistical distribution of the origins of replication sometimes looks remarkably like the distribution of eigenvalues from the Circular Orthogonal Ensemble (COE) of random matrices. This does not hold for more complex organisms, but a uniform thinning of the COE eigenvalues (which interpolates between the COE and uncorrelated, Poisson statistics) gives a platform to investigate characteristics of replication origin distribution in other species where data is available.
