Table of Contents
Fetching ...

Trajectory inference for a branching SDE model of cell differentiation

Elias Ventre, Aden Forrow, Nitya Gadhiwala, Parijat Chakraborty, Omer Angel, Geoffrey Schiebinger

TL;DR

This work shows how to use lineage trees available with recently developed CRISPR-based measurement technologies to disentangle proliferation and differentiation, and demonstrates the ability of this method to reliably reconstruct the landscape of a branching SDE from time-courses of simulated datasets with lineage tracing.

Abstract

A core challenge for modern biology is how to infer the trajectories of individual cells from population-level time courses of high-dimensional gene expression data. Birth and death of cells present a particular difficulty: existing trajectory inference methods cannot distinguish variability in net proliferation from cell differentiation dynamics, and hence require accurate prior knowledge of the proliferation rate. Building on Global Waddington-OT (gWOT), which performs trajectory inference with rigorous theoretical guarantees when birth and death can be neglected, we show how to use lineage trees available with recently developed CRISPR-based measurement technologies to disentangle proliferation and differentiation. In particular, when there is neither death nor subsampling of cells, we show that we extend gWOT to the case with proliferation with similar theoretical guarantees and computational cost, without requiring any prior information. In the case of death and/or subsampling, our method introduces a bias, that we describe explicitly and argue to be inherent to these lineage tracing data. We demonstrate in both cases the ability of this method to reliably reconstruct the landscape of a branching SDE from time-courses of simulated datasets with lineage tracing, outperforming even a benchmark using the experimentally unavailable true branching rates.

Trajectory inference for a branching SDE model of cell differentiation

TL;DR

This work shows how to use lineage trees available with recently developed CRISPR-based measurement technologies to disentangle proliferation and differentiation, and demonstrates the ability of this method to reliably reconstruct the landscape of a branching SDE from time-courses of simulated datasets with lineage tracing.

Abstract

A core challenge for modern biology is how to infer the trajectories of individual cells from population-level time courses of high-dimensional gene expression data. Birth and death of cells present a particular difficulty: existing trajectory inference methods cannot distinguish variability in net proliferation from cell differentiation dynamics, and hence require accurate prior knowledge of the proliferation rate. Building on Global Waddington-OT (gWOT), which performs trajectory inference with rigorous theoretical guarantees when birth and death can be neglected, we show how to use lineage trees available with recently developed CRISPR-based measurement technologies to disentangle proliferation and differentiation. In particular, when there is neither death nor subsampling of cells, we show that we extend gWOT to the case with proliferation with similar theoretical guarantees and computational cost, without requiring any prior information. In the case of death and/or subsampling, our method introduces a bias, that we describe explicitly and argue to be inherent to these lineage tracing data. We demonstrate in both cases the ability of this method to reliably reconstruct the landscape of a branching SDE from time-courses of simulated datasets with lineage tracing, outperforming even a benchmark using the experimentally unavailable true branching rates.
Paper Structure (22 sections, 9 theorems, 73 equations, 7 figures)

This paper contains 22 sections, 9 theorems, 73 equations, 7 figures.

Key Result

Theorem 1

With the previous notation, if the death rate $d = 0$, we can build, from the sequence of empirical measures $(\hat{\mu}_{t_i})_{i=1\cdots,N}$ and the lineage tree, a sequence of experimental probabilistic distributions $\hat{p} = (\hat{p}_{t_i})_{i=1\cdots,N}$ such that the minimizer $R_{N,\lambda,

Figures (7)

  • Figure 1: Different representations of a leaves of a tree evolving between $t_1$ and $t_2$, represented in (A), using: (B) the real generation numbers $m(X)$ associated to every leaf $X$ and (C) the observable generation numbers $\tilde{m}(X)$ associated to every leaf $X$. Note that if no cell dies between $t_1$ and $t_2$, the representations in (B) and (C) coincide.
  • Figure 2: Numerical convergence of reweighted empirical measure associated to a branching SDE to the ground-truth distribution of the underlying SDE. First column: ground-truth distributions, obtained by simulating $2500$ cells with the SDE. Second column: empirical measure obtained by simulating $25$ trees containing, in total, $1050$ leaves (first line) and $2584$ leaves (second line), with the corresponding branching SDE. Third column: evolution of the RMS distance between the ground-truth and (green) the reweighted empirical measure, (red) the empirical measure (without reweighting), and (blue) the empirical distribution obtained by simulating the non-branching SDE with the same number of cells as the number of leaves obtained with the branching SDE. Each row of plots correspond to one of the branching SDEs described in in Appendix \ref{['appendix_parameters']}.
  • Figure 3: The time-varying distribution of an SDE with a double-well potential can be reconstructed using data from its associated branching SDE. Panel A shows the observed data when 5 independent trees are generated using potential $V_1$ from Appendix \ref{['appendix_parameters']}. B shows the ground-truth distribution we aim to recover, simulated using the underlying SDE with 500 cells at each timepoint. The cumulated RMS distance to this ground-truth at each timepoint is rather high if the MFL algorithm is applied with no correction for proliferation (C, dashed). Using a heuristic correction for known growth rates reduces the error (C, dashdotted), and applying our reweighting method reduces it further (C, solid line). The second row shows the reconstructed distributions using the three approaches: MFL without correction (D, visibly biased towards the more proliferative right well), MFL with the growth rate correction (E), and MFL on the reweighted marginals (F).
  • Figure 4: Estimation of (A)-(C) the velocity fields and (D) the birth rates reconstructed using the methods detailed in Section \ref{['subsec_applications']}. For the velocity fields, we represent in (A) (resp. B) the comparison between the inferred velocity field, with the formula \ref{['reconstruction_velocity']} applied to the time-varying distribution reconstructed with the MFL algorithm \ref{['reconstruction_velocity']}, and the ground-truth using the parameters of the first branching SDE described in Appendix \ref{['appendix_parameters']}, for the first gene (resp. the second gene). The grey line represent the diagonal which would correspond to a perfect fit. In (C) we plot these two velocity fields along the trajectory, in dark for the inferred one and in red for the ground-truth. For the birth rate, we first use \ref{['eq_transfo_rho*_rhob*']} to estimate from this distribution the new distribution with branching $\rho^{*}$; second, we compute the associated birth rate using \ref{['reconstruction_proliferation']}. The background of (D) corresponds to the true birth rate used for the simulation. The data corresponds to the reconstructed trajectories of $100$ cells at each timepoints with the MFL algorithm from time-series of snapshots obtained by simulating $15$ trees.
  • Figure 5: Evolution of the bias w.r.t the subsampling rate. Simulations of the second branching SDE described in Appendix \ref{['appendix_parameters']} (A) without subsampling, (B) subsampling with a decreasing rate from 1 to 0.05, and (C) only the SDE without branching. In (D), we compare the cumulated RMS distance between the reweighted time-varying distributions and the ground-truth SDE at each timepoint for 5 different decreasing sequences of subsampling rates (including $q(t)=1$ where no subsampling occurs). Each box and whisker plot shows the distribution of RMS distances from 10 independent simulations. The subsampling rate indicated is $q(T)$, the rate for the last timepoint. The horizontal dashed line corresponds to the cumulated RMS distance between the simulations in (C) and the simulations of (A) without reweighting.
  • ...and 2 more figures

Theorems & Definitions (19)

  • Theorem 1
  • Definition 2
  • Proposition 5
  • proof : Proof of Proposition \ref{['prop_master_equation_realm']}
  • Corollary 6
  • proof
  • Theorem 7
  • proof
  • Lemma 8: see for example pavon1991free
  • Proposition 9
  • ...and 9 more