Table of Contents
Fetching ...

Amortized Phylodynamic Inference with Neural Bayes Estimators and Recursive Neural Networks

Alexander E. Zarebski, Thomas Williams, Louis du Plessis

TL;DR

A neural Bayes estimator for key epidemic quantities: the reproduction number, prevalence, and cumulative infections through time, which uses a recursive neural network as a tree embedding network with a prediction network conditioned on time and quantile level to generate the estimates.

Abstract

Phylodynamics is used to estimate epidemic dynamics from phylogenetic trees or genomic sequences of pathogens, but the likelihood calculations needed can be challenging for complex models. We present a neural Bayes estimator (NBE) for key epidemic quantities: the reproduction number, prevalence, and cumulative infections through time. By performing quantile regression over tree space, the NBE allows us to estimate posterior medians and credible intervals directly from a reconstructed tree. Our approach uses a recursive neural network as a tree embedding network with a prediction network conditioned on time and quantile level to generate the estimates. In simulation studies, the NBE achieves good predictive performance, with conservative uncertainty estimates. Compared with a BEAST2 fixed-tree analysis, the NBE gives less biased estimates of time-varying reproduction numbers in our test setting. Under a misspecified sampling model, the NBE performance degrades (as expected) but remains reasonable, and fine-tuning a pre-trained model yields estimates comparable to those from a model trained from scratch, at substantially lower computational cost.

Amortized Phylodynamic Inference with Neural Bayes Estimators and Recursive Neural Networks

TL;DR

A neural Bayes estimator for key epidemic quantities: the reproduction number, prevalence, and cumulative infections through time, which uses a recursive neural network as a tree embedding network with a prediction network conditioned on time and quantile level to generate the estimates.

Abstract

Phylodynamics is used to estimate epidemic dynamics from phylogenetic trees or genomic sequences of pathogens, but the likelihood calculations needed can be challenging for complex models. We present a neural Bayes estimator (NBE) for key epidemic quantities: the reproduction number, prevalence, and cumulative infections through time. By performing quantile regression over tree space, the NBE allows us to estimate posterior medians and credible intervals directly from a reconstructed tree. Our approach uses a recursive neural network as a tree embedding network with a prediction network conditioned on time and quantile level to generate the estimates. In simulation studies, the NBE achieves good predictive performance, with conservative uncertainty estimates. Compared with a BEAST2 fixed-tree analysis, the NBE gives less biased estimates of time-varying reproduction numbers in our test setting. Under a misspecified sampling model, the NBE performance degrades (as expected) but remains reasonable, and fine-tuning a pre-trained model yields estimates comparable to those from a model trained from scratch, at substantially lower computational cost.
Paper Structure (24 sections, 6 equations, 5 figures, 5 tables)

This paper contains 24 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The transmission process is modeled as a birth-death process, in which the transmission of the pathogen is represented by the birth of a new lineage, and the end of an infectious period is represented by the death of a lineage. A. The transmission tree is a complete description of the transmission and observation processes. Starting from a single infectious individual, new infections are indicated with grey arrows, and an infection may end with the pathogen genome sequenced (filled dot). At the end of this example, there is a prevalence of three ongoing infections and the cumulative number of infections is 13. B. The reconstructed tree describes the connections between the sequenced infections. The time of the final sequenced infection is designated as the present.
  • Figure 2: The predictions and $95\%$ confidence intervals on the testing data for the three prediction targets A the reproduction number, B the (common logarithm of the) prevalence of infection, and C the (common logarithm of the) cumulative number of infections. The bars associated with each point are blue if the estimated credible interval contains the true value and red otherwise. The dashed black line indicates perfect agreement and the solid black line is a least squares fit.
  • Figure 3: Example of a realization of the birth-death sampling process and the estimates of the reproduction number generated for this data using both MCMC and NBE. A. The simulated tree and the varying reproduction number and proportion of infections sampled through time. B. Estimates of the effective reproduction number produced by BDSky MCMC and the NBE and their $95\%$ credible intervals along with the true value. The NBE estimates do not make a piece-wise constant assumption, so they vary smoothly through time, whereas the MCMC uses a piece-wise constant estimator.
  • Figure 4: Learning curves show that the training and validation losses across epochs have effectively converged by epoch 500. The validation loss is less than the training loss because drop-out was applied when assessing the training loss. The vertical dashed line indicates the epoch with the lowest validation loss.
  • Figure 5: NBEs can be rapidly retrained to adapt to different prior distributions. (a) and (b) show the basic and noisy distributions considered. (c) shows the true values and the estimates of the basic reproduction number sampled from the noisy prior. The models used to generate the estimates start from either a random initialization (top row) or a model pre-trained on data sampled from the basic prior. We show the performance for three levels of training: no training (for the basic model), fine-tuning (for both models), and full training (from the random model). For each level of training we also show the indicative wall time required.