A practical introduction to ODE modelling in Stan for biological systems

Sara Hamis; John Forslund; Cici Chen Gu; Jodie A. Cochrane

A practical introduction to ODE modelling in Stan for biological systems

Sara Hamis, John Forslund, Cici Chen Gu, Jodie A. Cochrane

Abstract

Integrating dynamical systems models with time series data is a central part of contemporary mathematical biology. With the rich variety of available models and data, numerous methods and computational tools have been developed for these purposes. One such tool is Stan, a freely available and open-source probabilistic programming framework that provides efficient methods for estimating model parameters from data using computational Bayesian inference algorithms. Stan includes built-in mechanisms for working with ordinary differential equation (ODE) models, which are widely used in mathematical biology and related fields to study simulated, experimental, and real-world systems that change over time. Through step-by-step worked examples, including both pedagogical toy models and applications with real data, this article provides a practical, self-contained introduction to performing parameter estimation and model evaluation for first-order linear and nonlinear ODE models in Stan. The article also explains key statistical methods that underpin Stan and discusses computational Bayesian modelling in the context of biological applications.

A practical introduction to ODE modelling in Stan for biological systems

Abstract

Paper Structure (39 sections, 36 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 39 sections, 36 equations, 10 figures, 1 table, 1 algorithm.

Introduction
Background
Back to basics and Bayes' rule
Embracing uncertainty in Bayesian statistics and mathematical biology
What (and who) is Stan?
From Bayes' rule to Stan's algorithms
Sampling from posterior distributions
It all comes down to proposing and accepting (or rejecting)
Random-walk Metropolis
Metropolis--Hastings
Hamiltonian Monte Carlo
The No-U-Turn Sampler
Revisiting Bayes' rule for estimating parameters in ODE models
A shallow description of different pooling methods
Diagnosing sampling behaviour
...and 24 more sections

Figures (10)

Figure 1: A graphical illustration of Bayes’ rule with informative and uninformative priors. The top row has an informative prior $p(\theta)=\mathrm{Beta}(20,20)$, which is approximately Normal in shape and centered at $\theta=0.5$, while the bottom row has an noninformative prior $p(\theta)=\mathrm{Beta}(1,1)$, equivalent to a uniform distribution on $[0,1]$. In both cases, the likelihood takes the shape of a $\mathrm{Beta}(30,5)$ distribution as a function of $\theta$ and is shown on a relative scale, since it represents the probability of the observed data conditional on $\theta$, rather than a density over $\theta$ itself. The unnormalised posteriors are obtained by pointwise multiplication of the prior and likelihood in $\theta$. These are normalised in the right-most column, yielding posteriors that are proper probability distributions. Grey-shaded areas under the curves represent probability density functions normalized to unit area, whereas unshaded curves represent quantities shown on relative (unnormalised) scales.
Figure 2: A graphical illustration of how sampling constructs an empirical posterior. The left panel shows a trace of sampled $\theta$-values, beginning with burn-in samples that are discarded when constructing the posterior (yellow background), followed by retained samples. The middle panel summarises sample counts with a binned histogram, and the right panel shows the normalised empirical posterior obtained from the retained samples, represented as a probability mass function over $\theta$.
Figure 3: A schematic demonstration of proposal distributions in random-walk Metropolis (left) and Metropolis--Hastings with a shifted normal proposal (right). At each sample index $i$, the filled circle marks the current state $\theta^{(i)}$ and the ridge depicts the proposal density. The cross at index $i$ indicates the proposed state $\theta^{'(i)}$. At the next index $i+1$, the state satisfies $\theta^{(i+1)}=\theta^{'(i)}$ upon acceptance and $\theta^{(i+1)}=\theta^{(i)}$ upon rejection, as denoted by teal and red crosses, respectively.
Figure 4: A schematic illustration of Hamiltonian Monte Carlo (HMC) and the No-U-Turn Sampler (NUTS) in a one-dimensional parameter space. (a) The target distribution $\pi(\theta)$, i.e. the distribution explored by the sampler. (b) The negative log of the target defines a potential energy landscape on which the HMC and NUTS samplers operate. (c) A schematic HMC trajectory on the potential energy landscape, illustrating that fixed-length trajectories may reverse direction, i.e., U-turn. The Hamiltonian trajectory (solid line) and leapfrog stopping points (open circles) prior to the U-turn are shown in blue and correspond to motion toward increasing $\theta$. The trajectory (dashed line) and leapfrog stopping points (filled circles) after the U-turn are shown in red and correspond to motion toward decreasing $\theta$. (d) A schematic NUTS trajectory, which adaptively terminates the trajectory to avoid U-turns. Trajectories are schematic representations of a single leapfrog-based Hamiltonian trajectory.
Figure 5: A visual demonstration of how a likelihood function for a model parameter is constructed from an ODE formulation, a noise model, and observed data. Left: The line shows a forward simulation of the ODE model $dy(t)/dt = -\theta\, y(t)$ with initial condition $y(0)=1$, and $\theta=0.5$. Black points indicate observed data. The noise model is fixed as $\tilde{y}_i = y(t_i;\theta) + \varepsilon_i$, with $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$ and $\sigma=0.1$. Shaded ridges indicate the probability density of observing data at each measurement time under the chosen ODE, parameter value $\theta$, and noise model. Middle: The lines show forward simulations of the ODE model $y(t;\theta)$ for different fixed values of $\theta$, illustrating how changes in the parameter affect the model’s ability to predict the observed data. Right: The likelihood is constructed by evaluating the discrepancy between the observed data $\mathcal{D} = \{\tilde{y}_i\}_{i=1}^5$ and the corresponding model predictions $y(t_i;\theta)$ for each candidate parameter value under the fixed noise model. Markers indicate the parameter values explored in the middle panel, following the same colour legend.
...and 5 more figures

A practical introduction to ODE modelling in Stan for biological systems

Abstract

A practical introduction to ODE modelling in Stan for biological systems

Authors

Abstract

Table of Contents

Figures (10)