Table of Contents
Fetching ...

Moment-based parameter inference with error guarantees for stochastic reaction networks

Zekai Li, Mauricio Barahona, Philipp Thomas

TL;DR

This work proposes a method for the inference of parameters of stochastic reaction networks that works for both steady-state and time-resolved data and is applicable to networks with non-linear and rational propensities and demonstrates its use for uncertainty quantification, data integration, and prediction of latent species statistics through synthetic data.

Abstract

Inferring parameters of models of biochemical kinetics from single-cell data remains challenging because of the uncertainty arising from the intractability of the likelihood function of stochastic reaction networks. Such uncertainty falls beyond current error quantification measures, which focus on the effects of finite sample size and identifiability but lack theoretical guarantees when likelihood approximations are needed. Here, we propose a method for the inference of parameters of stochastic reaction networks that works for both steady-state and time-resolved data and is applicable to networks with non-linear and rational propensities. Our approach provides bounds on the parameters via convex optimisation over sets constrained by moment equations and moment matrices by taking observations to form moment intervals, which are then used to constrain parameters through convex sets. The bounds on the parameters contain the true parameters under the condition that the moment intervals contain the true moments, thus providing uncertainty quantification and error guarantees. Our approach does not need to predict moments and distributions for given parameters (i.e., it avoids solving or simulating the forward problem), and hence circumvents intractable likelihood computations or computationally expensive simulations. We demonstrate its use for uncertainty quantification, data integration and prediction of latent species statistics through synthetic data from common non-linear biochemical models including the Schlögl model and the toggle switch, a model of post-transcriptional regulation at steady state, and a birth-death model with time-dependent data.

Moment-based parameter inference with error guarantees for stochastic reaction networks

TL;DR

This work proposes a method for the inference of parameters of stochastic reaction networks that works for both steady-state and time-resolved data and is applicable to networks with non-linear and rational propensities and demonstrates its use for uncertainty quantification, data integration, and prediction of latent species statistics through synthetic data.

Abstract

Inferring parameters of models of biochemical kinetics from single-cell data remains challenging because of the uncertainty arising from the intractability of the likelihood function of stochastic reaction networks. Such uncertainty falls beyond current error quantification measures, which focus on the effects of finite sample size and identifiability but lack theoretical guarantees when likelihood approximations are needed. Here, we propose a method for the inference of parameters of stochastic reaction networks that works for both steady-state and time-resolved data and is applicable to networks with non-linear and rational propensities. Our approach provides bounds on the parameters via convex optimisation over sets constrained by moment equations and moment matrices by taking observations to form moment intervals, which are then used to constrain parameters through convex sets. The bounds on the parameters contain the true parameters under the condition that the moment intervals contain the true moments, thus providing uncertainty quantification and error guarantees. Our approach does not need to predict moments and distributions for given parameters (i.e., it avoids solving or simulating the forward problem), and hence circumvents intractable likelihood computations or computationally expensive simulations. We demonstrate its use for uncertainty quantification, data integration and prediction of latent species statistics through synthetic data from common non-linear biochemical models including the Schlögl model and the toggle switch, a model of post-transcriptional regulation at steady state, and a birth-death model with time-dependent data.
Paper Structure (18 sections, 61 equations, 6 figures)

This paper contains 18 sections, 61 equations, 6 figures.

Figures (6)

  • Figure 1: Flowchart of inference using moment constraints. Given a stochastic reaction network model (potentially with unobserved species), we use a series of constraints involving the moments and the reaction rate parameters to form a constrained set on them. The mathematical expressions of these constraints can be found in Sec. \ref{['sec:mmteqns']}. From count data at steady state or time-resolved data, we obtain bootstrap intervals on the moments and input these into the set. By introducing additional variables to replace the non-linear terms in the set, we have a semi-definite program (SDP) over which we optimise with respect to a parameter (or an unobserved moment) to obtain upper and lower bounds.
  • Figure 2: Parameter inference of the Schlögl model with varying moment order and sample size.(a) Optimisation-based bounds on $\langle x^1 \rangle_\pi$ and $\langle x^2 \rangle_\pi$ converge to the exact moment (abbr. to mmt.) values. Methods see kuntz2019bounding. (b) Bootstrap intervals of moments, with sample sizes 10000 and 20000, and the true moment values. Values are log-transformed with base 10. (c) Bounds on $k_1$ (left) and $k_3$ (right) obtained from $\zeta^d$ using bootstrap moment intervals with sample sizes 10000 and 20000, and using the exact moments. (d) Feasible regions of $k_1$ and $k_3$ obtained from moment intervals with sample size 20000 for $d = 3,\ldots, 7$. (e) Feasible regions of $k_1$ and $k_3$ obtained from moment intervals with sample sizes varying from 5000 to 25000 and $d = 4$. True parameters are $\boldsymbol{k} = (2,3,1,4)^\top$ and $k_2$ and $k_4$ are assumed to be known in (b) - (e).
  • Figure 3: Data integration for parameter inference of the toggle switch model with rational propensities.(a) Trajectories of the two species in the system. Parameters are Par1. (b) Histogram of a dataset generated with parameters Par1. (c) Box plots of natural-log-transformed relative bounds on $k_3$ with different methods and sample sizes (left: $n=500$, right: $n=2500$). Experiments are repeated 20 times. The bounds Sep i are obtained by using the dataset generated with parameters Par i and the bounds Comb are obtained by combining the five datasets used for Sep. Unbounded-above (abbr. to unb. abv.) issues are omitted for Sep with a sample size of 500. Parameters used here are Par1 (20, 0.7, 10, 1), Par2 (24, 0.82, 10, 1), Par3 (30, 1.1, 10, 1), Par4 (22, 0.8, 10, 1) and Par5 (28, 0.9, 10, 1). (d) The proportion of experiments resulting in at least one parameter unbounded-above compared with both parameters bounded-above (abbr. to bd. abv.) when inference is performed using each of the five datasets separately or using their combination. The sample size of each dataset increases from 250 to 1500, and the experiments are repeated 20 times for each sample size. The parameter settings are the same as in (c). In panels (c) and (d), the parameters $k_1$ and $k_2$ are assumed to be known and $d=6$.
  • Figure 4: Parameter bounds of the post-transcriptional regulation model with partial data.(a) Relative bounds on the unknown parameters when $k_1,k_3,k_5$ are known (left) and when $k_2,k_4,k_5$ are known (right) with $d=4$. In each case, we compare the bounds obtained when $X_1$ is unobserved; when $X_2$ is unobserved; when we have two datasets each with only one species measured; and when we have full data. (b) Histogram of $X_1$ and $X_2$ showing they have similar magnitudes at steady state. (c) Relative bounds on parameters obtained with $d = 3,4,5$. We compare the bounds on the unknown parameters when assuming $k_3,k_4,k_5$ known but $X_1$ observed and when assuming $k_1,k_2,k_5$ known but $X_2$ observed. In all panels, the true parameters are $\boldsymbol{k} = (6,0.8,5,0.5,1)^\top$. The sample size is $n=20000$ in panels (a) and (c).
  • Figure 5: Inference of moments of the unobserved species in the post-transcriptional regulation model. Parameters $(k_3,k_4,k_5)$ and $(k_1,k_2,k_5)$ are assumed to be known, respectively, when $X_1$ and $X_2$ are observed. The sample size is $n=20000$, the order $d$ is 3, and the true parameters are the same as in Fig. \ref{['fig:sp2_1']}. The relative bounds are obtained by dividing the original bounds by bootstrap intervals of the corresponding moments.
  • ...and 1 more figures