Table of Contents
Fetching ...

Estimating the growth rate of a birth and death process using data from a small sample

Carola Sophia Heinzel, Jason Schweinsberg

TL;DR

Estimating the growth rate $r=\lambda-\mu$ of a supercritical birth-death process from the coalescent times of a small sample is addressed. The authors develop a non large-$n$ analytical estimator based on coalescent point process data, avoiding dependence on the tree shape and MCMC priors. They derive a family of estimators of the form $\hat{r} = \dfrac{c(n)(n-1)(n-2)}{\sum_{i,j}(H_{i,n}-H_{j,n})^+}$ with several choices of $c(n)$, including a closed form for $c_{Inv}(n)$, and provide confidence intervals via simulated quantiles. Through extensive simulations and real-data applications, the method achieves competitive or superior performance to existing large-$n$ methods for small $n$, with substantial computational savings.

Abstract

The problem of estimating the growth rate of a birth and death processes based on the coalescence times of a sample of $n$ individuals has been considered by several authors (\cite{stadler2009incomplete, williams2022life, mitchell2022clonal, Johnson2023}). This problem has applications, for example, to cancer research, when one is interested in determining the growth rate of a clone. Recently, \cite{Johnson2023} proposed an analytical method for estimating the growth rate using the theory of coalescent point processes, which has comparable accuracy to more computationally intensive methods when the sample size $n$ is large. We use a similar approach to obtain an estimate of the growth rate that is not based on the assumption that $n$ is large. We demonstrate, through simulations using the R package \texttt{cloneRate}, that our estimator of the growth rate performs well in comparison with previous approaches when $n$ is small.

Estimating the growth rate of a birth and death process using data from a small sample

TL;DR

Estimating the growth rate of a supercritical birth-death process from the coalescent times of a small sample is addressed. The authors develop a non large- analytical estimator based on coalescent point process data, avoiding dependence on the tree shape and MCMC priors. They derive a family of estimators of the form with several choices of , including a closed form for , and provide confidence intervals via simulated quantiles. Through extensive simulations and real-data applications, the method achieves competitive or superior performance to existing large- methods for small , with substantial computational savings.

Abstract

The problem of estimating the growth rate of a birth and death processes based on the coalescence times of a sample of individuals has been considered by several authors (\cite{stadler2009incomplete, williams2022life, mitchell2022clonal, Johnson2023}). This problem has applications, for example, to cancer research, when one is interested in determining the growth rate of a clone. Recently, \cite{Johnson2023} proposed an analytical method for estimating the growth rate using the theory of coalescent point processes, which has comparable accuracy to more computationally intensive methods when the sample size is large. We use a similar approach to obtain an estimate of the growth rate that is not based on the assumption that is large. We demonstrate, through simulations using the R package \texttt{cloneRate}, that our estimator of the growth rate performs well in comparison with previous approaches when is small.

Paper Structure

This paper contains 22 sections, 2 theorems, 41 equations, 20 figures, 1 table.

Key Result

Lemma 3.1

We have the convergence in distribution

Figures (20)

  • Figure 1: Genealogy of a sample of size $n = 9$ from a birth and death process. The green lines represent internal branches. The blue lines represent external branches. The 8 coalescence times are indicated by $H_1, \dots, H_8$.
  • Figure 2: Densities of the six estimators when $r = 0.5$, $T = 40$, and $n = 5$. The black dotted line indicates the true growth rate.
  • Figure 3: Densities of the six estimators when $r = 0.5$, $T = 40$, and $n = 20$. The black dotted line indicates the true growth rate.
  • Figure 4: Comparison of the RMSE of the six estimators when $r = 0.5$ and $T = 40$.
  • Figure 5: Comparison of the RMSE of the six estimators when $r = 1$ and $T = 40$.
  • ...and 15 more figures

Theorems & Definitions (4)

  • Lemma 3.1
  • proof
  • Theorem 1
  • proof