Table of Contents
Fetching ...

Exactly simulating stochastic chemical reaction networks in sub-constant time per reaction

Joshua Petrack, David Doty

TL;DR

The first chemical reaction network stochastic simulation algorithm that can simulate reactions, provably preserving the exact stochastic dynamics (sampling from precisely the same distribution as the Gillespie algorithm), yet using time provably sublinear in $\ell$.

Abstract

The model of chemical reaction networks is among the oldest and most widely studied and used in natural science. The model describes reactions among abstract chemical species, for instance $A + B \to C$, which indicates that if a molecule of type $A$ interacts with a molecule of type $B$ (the reactants), they may stick together to form a molecule of type $C$ (the product). The standard algorithm for simulating (discrete, stochastic) chemical reaction networks is the Gillespie algorithm [JPC 1977], which stochastically simulates one reaction at a time, so to simulate $\ell$ consecutive reactions, it requires total running time $Ω(\ell)$. We give the first chemical reaction network stochastic simulation algorithm that can simulate $\ell$ reactions, provably preserving the exact stochastic dynamics (sampling from precisely the same distribution as the Gillespie algorithm), yet using time provably sublinear in $\ell$. Under reasonable assumptions, our algorithm can simulate $\ell$ reactions among $n$ total molecules in time $O(\ell/\sqrt n)$ when $\ell \ge n^{5/4}$, and in time $O(\ell/n^{2/5})$ when $n \le \ell \le n^{5/4}$. Our work adapts an algorithm of Berenbrink, Hammer, Kaaser, Meyer, Penschuck, and Tran [ESA 2020] for simulating the distributed computing model known as population protocols, extending it (in a very nontrivial way) to the more general chemical reaction network setting. We provide an implementation of our algorithm as a Python package, with the core logic implemented in Rust, with remarkably fast performance in practice.

Exactly simulating stochastic chemical reaction networks in sub-constant time per reaction

TL;DR

The first chemical reaction network stochastic simulation algorithm that can simulate reactions, provably preserving the exact stochastic dynamics (sampling from precisely the same distribution as the Gillespie algorithm), yet using time provably sublinear in .

Abstract

The model of chemical reaction networks is among the oldest and most widely studied and used in natural science. The model describes reactions among abstract chemical species, for instance , which indicates that if a molecule of type interacts with a molecule of type (the reactants), they may stick together to form a molecule of type (the product). The standard algorithm for simulating (discrete, stochastic) chemical reaction networks is the Gillespie algorithm [JPC 1977], which stochastically simulates one reaction at a time, so to simulate consecutive reactions, it requires total running time . We give the first chemical reaction network stochastic simulation algorithm that can simulate reactions, provably preserving the exact stochastic dynamics (sampling from precisely the same distribution as the Gillespie algorithm), yet using time provably sublinear in . Under reasonable assumptions, our algorithm can simulate reactions among total molecules in time when , and in time when . Our work adapts an algorithm of Berenbrink, Hammer, Kaaser, Meyer, Penschuck, and Tran [ESA 2020] for simulating the distributed computing model known as population protocols, extending it (in a very nontrivial way) to the more general chemical reaction network setting. We provide an implementation of our algorithm as a Python package, with the core logic implemented in Rust, with remarkably fast performance in practice.

Paper Structure

This paper contains 36 sections, 82 equations, 4 figures, 6 algorithms.

Figures (4)

  • Figure 1: Plots of counts vs. time for the Lotka-Volterra oscillator CRN, for initial population size $n \in \{10^3,10^4,10^5,10^8\}$ with half predator, half prey, using both rebop (Gillespie algorithm) and our batching algorithm implementation. Stochastic effects make the plots behave differently for small $n$, as well as differently from a deterministic ODE approximation, also plotted. As $n$ increases, stochastic noise decreases, and both stochastic simulators generate trajectories approaching the deterministic model. At $n=10^8$ all three plots are nearly indistinguishable.
  • Figure 2: \ref{['fig:dimerization_counts_vs_time']} shows a plot of ten example runs of the reversible dimerization reaction $2M \mathop{\rightleftharpoons}\limits_1^1 D$, starting with $\#M = 100$ and $\#D = 0$, showing counts vs time for each run. The CRN approaches an equilibrium with expected $25$ copies of $D$ and expected $50$ copies of $M$, typically reaching there after about 1 unit of time. (Though of course the counts bounce around even at equilibrium.) \ref{['fig:dimerization_comparison']} shows a plot of empirical distributions from rebop and the batching algorithm, of the count of $D$ at time 0.5, just prior to (likely) convergence, where $\mathrm{E}\left[ \#D \right] \approx 20.5.$ Here, "empirical probability" means the total number of runs in which the given count was the count of $D$ at time $0.5$, divided by the total number of trials.
  • Figure 3: Runtime scaling of our batching algorithm vs. the Gillespie algorithm, as implemented by the rebop Python package rebop, and the pure Rust rebop crate rebop_rust. Both Lotka-Volterra reactions (left) and Oregonator reactions (right) are simulated on the given initial population size $n$ (with an even split between all species) until time 1.0, which corresponds to $\Theta(n)$ total reactions. As expected, in both cases rebop shows asymptotic scaling of $\Theta(n)$ time (slope 1 on a log-log plot) compared to scaling of $\Theta(\sqrt{n})$ for batching (slope 1/2). However, batching is much more efficient for Lotka-Volterra. For exact reactions and rate constants used, see \ref{['fig:lotka_volterra_plot_with_passive_reactions']}.
  • Figure 4: Plot of counts vs. time for three CRNs with initial molecular count $n=10^5$, with the fraction of non-passive reactions also shown in red (note separate $y$-axes on right). The first two simulations are not majorly impeded by excessive passive reactions. For the Oregonator, however, the smallest sampled fraction of non-passive reactions is around 0.0003, representing a slowdown factor of $\approx 3000$. During the periodic part, this value oscillates between around 0.002 and 0.1. Rate constants modified from soloveichik2010dna.

Theorems & Definitions (33)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • proof
  • Definition 6
  • Definition 7
  • Definition 8
  • Definition 9
  • ...and 23 more