Table of Contents
Fetching ...

Gaussian approximations for fast Bayesian inference of partially observed branching processes with applications to epidemiology

Angus Lewis, Antonio Parrella, John Maclean, Andrew J. Black

TL;DR

The paper develops a Gaussian transition-density approximation for continuous-time multitype branching processes to enable fast Bayesian inference via Kalman filtering, addressing the computational bottlenecks of exact particle-filter-based methods in large populations. It introduces a hybrid switching strategy that combines Gaussian filtering with particle filtering to maintain accuracy when populations are small and leverage speed when they are large. The approach is validated on SEIR and SE8I8R epidemic models and applied to a complex COVID-19 dataset from Victoria, showing substantial speedups with controlled bias. The work offers a scalable, practical toolkit for state and parameter estimation in partially observed branching processes, with potential extensions to higher-order moments and efficient variance computations.

Abstract

We consider the problem of inference for the states and parameters of a continuous-time multitype branching process from partially observed time series data. Exact inference for this class of models, typically using sequential Monte Carlo, can be computationally challenging when the populations that are being modelled grow exponentially or the time series is long. Instead, we derive a Gaussian approximation for the transition function of the process that leads to a Kalman filtering algorithm that runs in a time independent of the population sizes. We also develop a hybrid approach for when populations are smaller and the approximation is less applicable. We investigate the performance of our approximation and algorithms to both a simple and a complex epidemic model, finding good adherence to the true posterior distributions in both cases with large computational speed-ups in most cases. We also apply our method to a COVID-19 dataset with time dependent parameters where exact methods are intractable due to the population sizes involved.

Gaussian approximations for fast Bayesian inference of partially observed branching processes with applications to epidemiology

TL;DR

The paper develops a Gaussian transition-density approximation for continuous-time multitype branching processes to enable fast Bayesian inference via Kalman filtering, addressing the computational bottlenecks of exact particle-filter-based methods in large populations. It introduces a hybrid switching strategy that combines Gaussian filtering with particle filtering to maintain accuracy when populations are small and leverage speed when they are large. The approach is validated on SEIR and SE8I8R epidemic models and applied to a complex COVID-19 dataset from Victoria, showing substantial speedups with controlled bias. The work offers a scalable, practical toolkit for state and parameter estimation in partially observed branching processes, with potential extensions to higher-order moments and efficient variance computations.

Abstract

We consider the problem of inference for the states and parameters of a continuous-time multitype branching process from partially observed time series data. Exact inference for this class of models, typically using sequential Monte Carlo, can be computationally challenging when the populations that are being modelled grow exponentially or the time series is long. Instead, we derive a Gaussian approximation for the transition function of the process that leads to a Kalman filtering algorithm that runs in a time independent of the population sizes. We also develop a hybrid approach for when populations are smaller and the approximation is less applicable. We investigate the performance of our approximation and algorithms to both a simple and a complex epidemic model, finding good adherence to the true posterior distributions in both cases with large computational speed-ups in most cases. We also apply our method to a COVID-19 dataset with time dependent parameters where exact methods are intractable due to the population sizes involved.

Paper Structure

This paper contains 17 sections, 39 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: Simple SEIR epidemic model dynamics for exposed and infectious individuals. The grey lines show the full realisations, with the dots indicating the state of the realisations at $t = 5,\,15,\,25$. The solid black line is the true mean of the process for all time points. The crosses are the true means for the state of the process at times $t = 5,\,15,\,25$. The ellipses are the contours of the Gaussian approximations for the covariance evaluated at one standard deviation from the mean, centred at the crosses with covariance equal to that of the CTBP at the respective times.
  • Figure 2: Simple SEIR epidemic model dynamics for new daily cases. The grey lines show sample paths of new daily cases for 20 realisations, with the dots highlighting a single realisation. The solid blue line is the true mean of new daily cases and the ribbons mark plus and minus one standard deviation from the mean.
  • Figure 3: Simulated realisations of the SEIR epidemic model for three values of the reproductive number $R_0=\beta/\lambda=1.12,\,2.8,\,4.\overline{6}$ (solid lines) and the median (dashed lines) and symmetric 80% credible interval of the filtering distribution, $p(z_{t,i}\,|\, \mathbf y_{1:t}),\, i=1,2,3$, estimated by the Gaussian approximation. Note, the y-axes have logarithmic scales and that the filtering distributions are only estimated from the observed case data, so the exposed and infectious are unobserved.
  • Figure 4: (Left) Posterior distributions for three different values of the reproductive number $R_0=\beta/\lambda=1.12,\,2.8,\,4.\overline{6}$, (top, middle and bottom, respectively) and $T=25$ using the three methods to evaluate the likelihood for the SEIR model. (Right) Posterior distributions for three different lengths of time-series $T=10,\,15,\,25$, (top, middle and bottom, respectively) and $R_0=2.8$ using the three methods to evaluate the likelihood for the SEIR model. The vertical lines are at the mean of the posterior distributions.
  • Figure 5: The median and symmetric 80% credible interval of the filtering distribution, $p(z_{t,i}\,|\, \mathbf y_{1:t}),\, i=1,2,3$, estimated by the Gaussian approximation (blue) and PF (red), and the simulated realisations of the SEIR epidemic model for $R_0=\beta/\lambda=1.12$ (left) and $4.\overline{6},$ (right) (solid grey lines).
  • ...and 10 more figures