Table of Contents
Fetching ...

Thinned COE random matrix models for DNA replication

Huw Day, Nina C. Snaith

TL;DR

The paper investigates whether replication-origin spacings across diverse organisms can be modeled by random-matrix statistics, focusing on the Circular Orthogonal Ensemble (COE). It compares empirical spacings to the COE expectation using Wigner's surmise and then uses thinned COE ensembles, parameterized by a thinning factor $p$, to interpolate toward Poisson statistics; RMSE on cumulative distributions guides the fit. Results show that some model organisms, notably certain yeasts, are well described by thinned COE with modest thinning, while more complex organisms such as humans exhibit large outliers not captured by these models. The work suggests that thinning COE provides a flexible framework for capturing replication-origin spacing in some species and highlights the need for alternative mechanisms to explain spacing in more complex genomes.

Abstract

This paper details an observation that for more primitive organisms, such as some yeasts, the statistical distribution of the origins of replication sometimes looks remarkably like the distribution of eigenvalues from the Circular Orthogonal Ensemble (COE) of random matrices. This does not hold for more complex organisms, but a uniform thinning of the COE eigenvalues (which interpolates between the COE and uncorrelated, Poisson statistics) gives a platform to investigate characteristics of replication origin distribution in other species where data is available.

Thinned COE random matrix models for DNA replication

TL;DR

The paper investigates whether replication-origin spacings across diverse organisms can be modeled by random-matrix statistics, focusing on the Circular Orthogonal Ensemble (COE). It compares empirical spacings to the COE expectation using Wigner's surmise and then uses thinned COE ensembles, parameterized by a thinning factor , to interpolate toward Poisson statistics; RMSE on cumulative distributions guides the fit. Results show that some model organisms, notably certain yeasts, are well described by thinned COE with modest thinning, while more complex organisms such as humans exhibit large outliers not captured by these models. The work suggests that thinning COE provides a flexible framework for capturing replication-origin spacing in some species and highlights the need for alternative mechanisms to explain spacing in more complex genomes.

Abstract

This paper details an observation that for more primitive organisms, such as some yeasts, the statistical distribution of the origins of replication sometimes looks remarkably like the distribution of eigenvalues from the Circular Orthogonal Ensemble (COE) of random matrices. This does not hold for more complex organisms, but a uniform thinning of the COE eigenvalues (which interpolates between the COE and uncorrelated, Poisson statistics) gives a platform to investigate characteristics of replication origin distribution in other species where data is available.

Paper Structure

This paper contains 5 sections, 2 equations, 28 figures, 2 tables.

Figures (28)

  • Figure 1: This is Figure 3A from Newman. Original caption: "Inter-origin spacings in the S. cerevisiae genome. (A) Interorigin spacings in S. cerevisiae were calculated and assigned to different 1 kb bins. The frequency of origins in each bin is shown. Red dots: mean origin separation in a computer simulation where the same number of origins were placed at random on the whole S. cerevisiae genome. Grey dots: mean origin separation in a computer simulation where the same number of origins were placed at random only in the intergenic regions of the S. cerevisiae genome"
  • Figure 2: On the left is a visualisation of the eigenvalues of a typical $50\times 50$ COE matrix. Notice here that points are far less likely to cluster or have large spacings than in the figure on the right, which features 50 random, uncorrelated points.
  • Figure 3: The probability density function (left) and cumulative density function (right) for the distribution of spacings between $2\times2$ COE eigenvalues normalised to have unit average spacing: Wigner's surmise $p(s)$ from \ref{['Wigner']}. The height of this second graph at a particular spacing represents the proportion of spacings which are less than that particular spacing.
  • Figure 4: Left: Histogram of re-scaled spacings between midpoints of adjacent replication origins from the yeast strain Kluyveromyces lactis (or K lactis), data taken from Klactis, with Wigner's surmise and exponential distribution for comparison. Right: Cumulative distribution of the same data. Total Number of Spacings: 142. Number of Chromosomes: 6.
  • Figure 5: Left: Histogram of re-scaled spacings between midpoints of adjacent replication origins from the yeast strain Lachancea waltii (or L waltii), data taken from lwaltti2, with Wigner's surmise and exponential distribution for comparison. Right: Cumulative distribution of the same data. Total Number of Spacings: 186. Number of Chromosomes: 8.
  • ...and 23 more figures