Table of Contents
Fetching ...

Gradient estimators for parameter inference in discrete stochastic kinetic models

Ludwig Burger, Annalena Kofler, Lukas Heinrich, Ulrich Gerland

Abstract

Stochastic kinetic models are ubiquitous in physics, yet inferring their parameters from experimental data remains challenging. In deterministic models, parameter inference often relies on gradients, as they can be obtained efficiently through automatic differentiation. However, these tools cannot be directly applied to stochastic simulation algorithms (SSA) such as the Gillespie algorithm, since sampling from a discrete set of reactions introduces non-differentiable operations. In this work, we adopt three gradient estimators from machine learning for the Gillespie SSA: the Gumbel-Softmax Straight-Through (GS-ST) estimator, the Score Function estimator, and the Alternative Path estimator. We compare the properties of all estimators in two representative systems exhibiting relaxation or oscillatory dynamics, where the latter requires gradient estimation of time-dependent objective functions. We find that the GS-ST estimator mostly yields well-behaved gradient estimates, but exhibits diverging variance in challenging parameter regimes, resulting in unsuccessful parameter inference. In these cases, the other estimators provide more robust, lower variance gradients. Our results demonstrate that gradient-based parameter inference can be integrated effectively with the Gillespie SSA, with different estimators offering complementary advantages.

Gradient estimators for parameter inference in discrete stochastic kinetic models

Abstract

Stochastic kinetic models are ubiquitous in physics, yet inferring their parameters from experimental data remains challenging. In deterministic models, parameter inference often relies on gradients, as they can be obtained efficiently through automatic differentiation. However, these tools cannot be directly applied to stochastic simulation algorithms (SSA) such as the Gillespie algorithm, since sampling from a discrete set of reactions introduces non-differentiable operations. In this work, we adopt three gradient estimators from machine learning for the Gillespie SSA: the Gumbel-Softmax Straight-Through (GS-ST) estimator, the Score Function estimator, and the Alternative Path estimator. We compare the properties of all estimators in two representative systems exhibiting relaxation or oscillatory dynamics, where the latter requires gradient estimation of time-dependent objective functions. We find that the GS-ST estimator mostly yields well-behaved gradient estimates, but exhibits diverging variance in challenging parameter regimes, resulting in unsuccessful parameter inference. In these cases, the other estimators provide more robust, lower variance gradients. Our results demonstrate that gradient-based parameter inference can be integrated effectively with the Gillespie SSA, with different estimators offering complementary advantages.

Paper Structure

This paper contains 12 sections, 29 equations, 6 figures.

Figures (6)

  • Figure 1: Illustration of the Gillespie SSA and gradient estimators. (a) The Gillespie SSA generates stochastic trajectories by sampling a molecule update $\Delta N_s$ and a time increment $\Delta t_s$ at each step $s$ based on a previous state and initial parameters $\theta$. It is not directly differentiable with respect to $\theta$ because both $\Delta t_s$ and $\Delta N_s$ are sampled from parameter-dependent probability distributions, and $\Delta N_s$ is a discrete random variable. (b) In the GS-ST estimator, gradients are obtained by reparametrizing the sampling of $\Delta t_s$ and $\Delta N_s$ using inverse transform sampling for the time and the Gumbel-max trick for the molecule update. For gradient evaluation, the discrete molecule update is approximated by the continuous, differentiable softmax function, enabling automatic differentiation. (c) In the SF estimator, gradients are obtained by evaluating the product of the simulation output and the score function. At each Gillespie step, score contributions from the molecule and the time update are computed, and accumulated along the trajectory. (d) In the AP estimator, gradients are obtained by evaluating the weighted difference of an alternative and a primal path. The alternative path $\Delta \tilde{N}_s$ can either be updated or re-sampled in every Gillespie step. The gradient contribution of the time update $\Delta t_s$ is included through a score term, analogous to the SF estimator.
  • Figure 2: Bimolecular association model. (a) Starting from an initial configuration without bimolecular complexes, the system relaxes toward a steady state with a non-zero number of A--B complexes. The steady-state copy number of complexes is governed by the dissociation rate $k$, the total numbers of molecules A and B, and the system volume (here $N_{\rm A}^{\rm tot}=N_{\rm B}^{\rm tot}=200$, $V=20$, and $k_{\rm ref}=5$). (b) The distribution of A--B complexes obtained at time $t=0.5$ agrees well with the analytical distribution derived from the Chemical Master equation. (c) The loss function measuring the relative deviation between simulated and reference mean copy numbers at time $t = 0.5$ exhibits a clear minimum at $k_{\rm ref}$. (d--f) The gradient of the loss function, ${\rm d}\mathcal{L}/{\rm d}\log k$, evaluated using the GS-ST method (d), the SF method (e), and the AP method (f), agrees well with the analytical reference. Data points in panels (c--f) show mean and uncertainties obtained by averaging over 50 independent batches, each consisting of 2000 stochastic trajectories.
  • Figure 3: Variance scaling of gradient estimators in the bimolecular association model. (a) The variance of the GS-ST gradient estimate shows different scaling behavior with the number of Gillespie steps $s$ for different values of the temperature $\tau$. For large $\tau$, the variance converges, whereas for small $\tau$ it grows exponentially. (b) The Lyapunov exponent associated with the GS-ST estimator indicates parameter regimes with convergent (negative exponent) or divergent variance (positive exponent). The temperature at which the Lyapunov exponent changes sign depends on the dissociation rate $k$. (c) The variance of the gradient estimated with the SF and AP estimators exhibit linear variance increase with $s$. (d) For the SF estimator, the linear variance scaling of the estimated gradient results from the linear variance scaling of the score function itself. Data points show the mean variance and its uncertainty obtained by computing the variance over $10^4$ stochastic trajectories and repeating this procedure ten times; in panel (a), the shaded region indicates the 10th--90th percentile interval instead of the standard deviation. Unless stated otherwise, $k = k_{\rm ref}$.
  • Figure 4: Repressilator model. (a) In the repressilator, three protein species are produced and degraded, and mutually repress each other’s production. (b) In certain parameter regimes, the system exhibits self-sustained oscillations under intrinsic noise. Mean and variance are estimated from 10 000 Gillespie trajectories with $k_{\rm p}=100$, $K_{\rm d}=10$, $h=3$, and $V=1$. (c--e) Analysis of the dynamical regimes based on the dimensionless chemical rate equations. (c) Self-sustained oscillations arise for sufficiently large effective production rate $k_{\rm p}/K_{\rm d}$ (colored regions), while the white regions correspond to relaxation dynamics. The dimensionless oscillation amplitude depends only weakly on the Hill coefficient. (d) The dimensionless oscillation amplitude increases approximately linearly with the effective production rate. (e) The oscillation period scales logarithmically with the effective production rate.
  • Figure 5: Parameter inference in the repressilator model. The production rate constant $k_{\rm p}$ and dissociation constant $K_{\rm d}$ are inferred from trajectory data by minimizing a loss function using stochastic gradient descent; all other parameters are fixed ($h=3$, $V=1$). Results are shown for 50 optimization runs, corresponding to different reference parameter sets and different initializations of the optimization. The reference parameters span two orders of magnitude. Panels (a--b), (c--d), and (e--f) compare inferred and reference parameter values obtained using GS-ST, SF, and AP, respectively. In each optimization step, the gradient of the loss is estimated from 1000 stochastic trajectories; for GS-ST, we use $\tau = 0.3$ for the relaxation of the molecule update, and $\tau_{\rm time} = 0.05$ for the relaxation of the time update. Overall, GS-ST and SF recover the reference parameters well, whereas AP exhibits larger deviations.
  • ...and 1 more figures