Table of Contents
Fetching ...

Joint Probability Distribution of mRNA and Protein Molecules in a Stochastic Gene Expression Model

Yuntao Lu, Yunxin Zhang

TL;DR

This work addresses the challenge of analytically analyzing stochastic gene expression models with multiple gene states by deriving a hierarchy of ODEs for two-dimensional binomial moments $\mathcal{B}_{p,q}$ from the chemical master equation via a generating-function formulation. The authors show that, in steady state, these binomial moments satisfy a linear system from which the joint distribution $\mathbb{P}(m,n)$ of mRNA and protein can be reconstructed using $\mathbb{P}(m,n)=\sum_{p=m}^\infty\sum_{q=n}^\infty (-1)^{p+q+m+n}\binom{p}{m}\binom{q}{n}\mathcal{B}_{p,q}$. They provide explicit low-order moment expressions, establish a stable hierarchical algorithm to compute moments to arbitrary order, and compare the exact results with burst-approximation models, showing that the mean is preserved while the variance may differ, with a quantitative bound on the discrepancy. The method enables rigorous, non-burst analyses of complete gene-expression models and offers a practical route to quantify when burst approximations fail, with potential extensions to multiscale modeling and MSM construction from molecular dynamics data.

Abstract

Stochastic modeling of gene expression is a classic problem in theoretical biophysics. However, models formulated via chemical master equation have long been considered analytically intractable unless burst approximation is applied. This article shows that general stochastic gene expression models with an arbitrary number of gene states admit direct analysis. Based on chemical master equation and high-dimensional binomial moment method, we derive recurrence relations for binomial moments in steady state, yielding analytical expressions to arbitrary order in a hierarchical manner. Subsequently, the joint probability mass function of mRNA and protein copy number can be reconstructed. An algorithm is developed for numerical computation. Particularly, explicit expressions for low-order cumulants are presented. Compared with models under burst approximation, the mean remains exact, whereas the variance typically differs. We estimate the difference between two second-order binomial moments using functional analysis, therefore evaluating the validity of burst approximation.

Joint Probability Distribution of mRNA and Protein Molecules in a Stochastic Gene Expression Model

TL;DR

This work addresses the challenge of analytically analyzing stochastic gene expression models with multiple gene states by deriving a hierarchy of ODEs for two-dimensional binomial moments from the chemical master equation via a generating-function formulation. The authors show that, in steady state, these binomial moments satisfy a linear system from which the joint distribution of mRNA and protein can be reconstructed using . They provide explicit low-order moment expressions, establish a stable hierarchical algorithm to compute moments to arbitrary order, and compare the exact results with burst-approximation models, showing that the mean is preserved while the variance may differ, with a quantitative bound on the discrepancy. The method enables rigorous, non-burst analyses of complete gene-expression models and offers a practical route to quantify when burst approximations fail, with potential extensions to multiscale modeling and MSM construction from molecular dynamics data.

Abstract

Stochastic modeling of gene expression is a classic problem in theoretical biophysics. However, models formulated via chemical master equation have long been considered analytically intractable unless burst approximation is applied. This article shows that general stochastic gene expression models with an arbitrary number of gene states admit direct analysis. Based on chemical master equation and high-dimensional binomial moment method, we derive recurrence relations for binomial moments in steady state, yielding analytical expressions to arbitrary order in a hierarchical manner. Subsequently, the joint probability mass function of mRNA and protein copy number can be reconstructed. An algorithm is developed for numerical computation. Particularly, explicit expressions for low-order cumulants are presented. Compared with models under burst approximation, the mean remains exact, whereas the variance typically differs. We estimate the difference between two second-order binomial moments using functional analysis, therefore evaluating the validity of burst approximation.

Paper Structure

This paper contains 22 sections, 1 theorem, 40 equations, 5 figures, 1 algorithm.

Key Result

THEOREM 1

Let $C=(c_{i,j})_{N\times N}$ be a $N\times N$ strictly row diagonally dominant matrix with $\Theta:=\min_{1\leq i\leq N}\left(\mid c_{i,i}\mid-\sum_{j\neq i}\mid c_{i,j}\mid\right)$. Then $\lVert C^{-1}\rVert_\infty\leq \Theta^{-1}$.

Figures (5)

  • Figure 1: Hierarchy of Binomial Moments: This image illustrates the hierarchical structure for calculating binomial moments up to desired orders. The layers are organized from $L=0$ at the top to $L=5$ at the bottom. Within each layer, binomial moments are arranged from left to right in order of increasing $p$ (and correspondingly decreasing $q$). The calculation proceeds from lower to higher layers. Within each layer, binomial moments are derived sequentially from right to left. The arrows indicate the dependency flow, showing how binomial moments in higher layers depend on values calculated earlier in the same or preceding layers.
  • Figure 2: Probability Distribution of mRNA and Protein Copy Number obtained using Analytical Results: This stem plot illustrates the probability mass function obtained by first computing binomial moments according to Algorithm S.1, and then reconstructing via \ref{['reconstruct']}. In this example, parameters in the model \ref{['CME1']} are $D_0=$$-2.020.010.010.1-7.20.100.01-6.01$, $D_1 =$$101151015$, $u=3$, $v=2$, and $\delta=1$. The largest layer in Algorithm S.1 is $L_{\text{max}}=300$. $\mathbb{P}(m,n)$ is computed for $0\leq m,n\leq 16$. The infinite series \ref{['reconstruct']} is truncated up to the largest layer $L_{\text{max}}$. In the stem plot in the top panel, we use the coarse-grained probability mass function $P(m,n)$ defined in \ref{['sec6']}. The two stem plots in the bottom panel are the marginal distribution of mRNA and protein copy number, respectively. The marginal distributions are obtained directly from the joint probability distribution.
  • Figure 3: Probability Distribution of mRNA and Protein Copy Number obtained through Stochastic Simulations: The parameters in the model \ref{['CME1']} are the same as those in \ref{['our']}. The histogram in the upper panel is generated with $1\times10^6$ trajectories of SSA, all truncated at dimensionless time $t=50$. The initial condition is $S(0)=1$, $M_1(0)=0$, and $M_2(0)=0$. Python package GillesPy2 is implemented with C++ solver. The two stem plots in the bottom panel are the marginal distribution of mRNA and protein copy number, respectively. The marginal distributions are obtained directly from the joint probability distribution.
  • Figure 4: Probability Distribution of mRNA and Protein Copy Number obtained through Finite State Projection Algorithm: The parameters in the model \ref{['CME1']} are the same as those in \ref{['our']}. The histograms in the top panel (joint probability distribution of mRNA and protein copy number) are plotted according to FSP at $t=20$, and the truncation error is below $1\times10^{-4}$. The initial condition is $S(0)=1$, $M_1(0)=0$, and $M_2(0)=0$. The two stem plots in the bottom panel are the marginal distribution of mRNA and protein copy number, respectively. The marginal distributions are obtained directly from the joint probability distribution.
  • Figure 5: Gap between Variance and its Upper Bound: In the left panel, all the parameters expect for $u$ are the same as those in \ref{['our']} and are fixed. While $u$ varies in $[1,20]$, we compute the discrepancy between the variances of the protein copy number according to \ref{['LowBMcoarse']} and \ref{['LowBMburst']}, and also evaluate its upper bound given by \ref{['accuracy']}. In the right panel, $u=10$, $v=2$, and $\delta=1$ are fixed, while $D_0$ and $D_1$ take the form $a_{i,j}=0\;(i\neq j)$ and $b_{i,j}=i$. For the order of the model $1\leq N\leq 100$, we compute the discrepancy between the variances of the protein copy number according to \ref{['LowBMcoarse']} and \ref{['LowBMburst']}, and also evaluate its upper bound given by \ref{['accuracy']}. Note that both illustrations are plotted in $\ln(\cdot)$ scale, and that the discrepancy between variances equals two times the discrepancy between second-order binomial moments.

Theorems & Definitions (1)

  • THEOREM