On the Mathematics of RNA Velocity II: Algorithmic Aspects
Tiejun Li, Yizhuo Wang, Guoguo Yang, Peijie Zhou
TL;DR
The paper tackles the algorithmic foundations of RNA velocity analysis by fixing a global gene-shared latent time, quantifying uncertainty in EM-inferred kinetic parameters, optimizing the velocity-kernel bandwidth for random-walk approximations, and deriving transition-time estimates between cell states via mean first hitting times. It introduces two time-scale fixation strategies (multiplicative and additive noise models) yielding closed-form solutions for the gene-shared time and gene-re-wide scaling, and provides synthetic validations demonstrating robust time alignment across genes. Uncertainty quantification is grounded in Fisher information and SEM-based EM analyses, delivering practical confidence intervals for kinetic parameters and velocity directions. Finally, the work analyzes the finite-sample behavior of velocity-induced random walks, identifies the optimal kernel bandwidth scaling with sample size and dimension, and presents a taboo-set technique to obtain meaningful transition times in bifurcating developmental trajectories. Collectively, these contributions offer rigorous, implementable tools to improve robustness and interpretability of RNA velocity workflows in scRNA-seq data.
Abstract
In a previous paper [CSIAM Trans. Appl. Math. 2 (2021), 1-55], the authors proposed a theoretical framework for the analysis of RNA velocity, which is a promising concept in scRNA-seq data analysis to reveal the cell state-transition dynamical processes underlying snapshot data. The current paper is devoted to the algorithmic study of some key components in RNA velocity workflow. Four important points are addressed in this paper: (1) We construct a rational time-scale fixation method which can determine the global gene-shared latent time for cells. (2) We present an uncertainty quantification strategy for the inferred parameters obtained through the EM algorithm. (3) We establish the optimal criterion for the choice of velocity kernel bandwidth with respect to the sample size in the downstream analysis and discuss its implications. (4) We propose a temporal distance estimation approach between two cell clusters along the cellular development path. Some illustrative numerical tests are also carried out to verify our analysis. These results are intended to provide tools and insights in further development of RNA velocity type methods in the future.
