Table of Contents
Fetching ...

Branch lengths for geodesics in the directed landscape and mutation patterns in growing spatially structured populations

Shirshendu Ganguly, Jason Schweinsberg, Yubo Shuai

Abstract

Consider a population that is expanding in two-dimensional space. Suppose we collect data from a sample of individuals taken at random either from the entire population, or from near the outer boundary of the population. A quantity of interest in population genetics is the site frequency spectrum, which is the number of mutations that appear on $k$ of the $n$ sampled individuals, for $k = 1, \dots, n-1$. As long as the mutation rate is constant, this number will be roughly proportional to the total length of all branches in the genealogical tree that are on the ancestral line of $k$ sampled individuals. While the rigorous literature has primarily focused on models without any spatial structure, in many natural settings, such as tumors or bacteria colonies, growth is dictated by spatial constraints. Many such two dimensional growth models are expected to fall in the KPZ universality class. In this article we adopt the perspective that for population models in the KPZ universality class, the genealogical tree can be approximated by the tree formed by the infinite upward geodesics in the directed landscape, a universal scaling limit constructed in \cite{dov22}, starting from $n$ randomly chosen points. Relying on geodesic coalescence, we prove new asymptotic results for the lengths of the portions of these geodesics that are ancestral to $k$ of the $n$ sampled points and consequently obtain exponents driving the site frequency spectrum as predicted in \cite{fgkah16}. An important ingredient in the proof is a new tight estimate of the probability that three infinite upward geodesics stay disjoint up to time $t$, i.e., a sharp quantitative version of the well studied N3G problem, which is of independent interest.

Branch lengths for geodesics in the directed landscape and mutation patterns in growing spatially structured populations

Abstract

Consider a population that is expanding in two-dimensional space. Suppose we collect data from a sample of individuals taken at random either from the entire population, or from near the outer boundary of the population. A quantity of interest in population genetics is the site frequency spectrum, which is the number of mutations that appear on of the sampled individuals, for . As long as the mutation rate is constant, this number will be roughly proportional to the total length of all branches in the genealogical tree that are on the ancestral line of sampled individuals. While the rigorous literature has primarily focused on models without any spatial structure, in many natural settings, such as tumors or bacteria colonies, growth is dictated by spatial constraints. Many such two dimensional growth models are expected to fall in the KPZ universality class. In this article we adopt the perspective that for population models in the KPZ universality class, the genealogical tree can be approximated by the tree formed by the infinite upward geodesics in the directed landscape, a universal scaling limit constructed in \cite{dov22}, starting from randomly chosen points. Relying on geodesic coalescence, we prove new asymptotic results for the lengths of the portions of these geodesics that are ancestral to of the sampled points and consequently obtain exponents driving the site frequency spectrum as predicted in \cite{fgkah16}. An important ingredient in the proof is a new tight estimate of the probability that three infinite upward geodesics stay disjoint up to time , i.e., a sharp quantitative version of the well studied N3G problem, which is of independent interest.

Paper Structure

This paper contains 29 sections, 51 theorems, 297 equations, 18 figures.

Key Result

Theorem 1.1

Let $\widetilde{U}_n$ be $n$ points sampled uniformly at random from $[0,n]\times\{0\}$. Then there exists a constant $C_1\in(0,\infty)$ such that as $n\rightarrow\infty$, for all positive integers $k$,

Figures (18)

  • Figure 1: A genealogical tree of a sample of size $n = 6$. Dots indicate the times of mutations. Three mutations near the bottom of the tree are inherited by only one individual (1, 2, or 6). One mutation is inherited by the individuals 4 and 5. The two mutations closest to the top are inherited by three individuals each. Therefore, the site frequency spectrum is $M_{1,n} = 3$, $M_{2,n} = 1$, $M_{3,n} = 2$, and $M_{4,n} = M_{5,n} = 0$.
  • Figure 2: Genealogical trees of samples of size $n = 200$ from first passage percolation. The growth of the cluster was simulated until there were $N = 5,000,000$ occupied sites. In the picture on the left, the blue lines show the geodesic paths from the origin to $n$ randomly chosen points from the cluster. In the picture on the right, the $n$ points were sampled from the boundary of the cluster.
  • Figure 3: The three dots represent individuals along the top line with descendants surviving at least $r^{3/2}$ time units into the future, and the red ovals represent the descendants of these three individuals. The area of the ovals is $O(r^{5/2})$.
  • Figure 4: A comparison between $Anc_k(\Pi_{\lambda_n^*})\cap A_n$ (left plot) and $Anc_k(\Pi_{\lambda_n^*}\cap A_n)$ (right plot) with $k=3$. The thick lines represent branches that account for the length. The tree $T_1$ illustrates the case where the branch is counted in one but not the other. In the left plot, the thick portion is in the box $A_n$ and is along the ancestral lineage of ${a,b,c}$ in $\Pi_{\lambda_n^*}$. In the right plot, the thick portion is along the ancestral lineage of ${b,c,d}$ in $\Pi_{\lambda_n^*}\cap A_n$. These differences typically occur near the boundary of the box. The tree $T_2$ illustrates the typical case where the branch length counted in one is also counted in the other.
  • Figure 5: A geodesic $\gamma_1$ from $(y,t)$ to $(y',t')$ and a geodesic $\gamma_0\oplus\gamma_2\oplus\gamma_3$ from $(x,s)$ to $(y",t")$. One can switch from $\gamma_2$ to $\gamma_1$ so that $\gamma_0\oplus\gamma_1\oplus\gamma_3$ is a geodesic from $(x,s)$ to $(y",t")$.
  • ...and 13 more figures

Theorems & Definitions (88)

  • Theorem 1.1
  • Theorem 1.2
  • Remark 1.3
  • Theorem 1.4
  • Proposition 2.1
  • Proposition 2.2
  • Proposition 2.3
  • Proposition 2.4
  • Proposition 2.5
  • Proposition 2.6
  • ...and 78 more