Pair-based estimators of infection and removal rates for stochastic epidemic models

Seth D. Temple; Jonathan Terhorst

Pair-based estimators of infection and removal rates for stochastic epidemic models

Seth D. Temple, Jonathan Terhorst

Abstract

Stochastic epidemic models can estimate infection and removal rates, and derived quantities such as the basic reproductive number ($R_0$), when both infection and removal times are observed. In practice, however, removal times are often available while infection times are not, and existing methods that rely only on removal times can become unstable or biased. We study inference for stochastic SIR/SEIR models in a partial--observation setting. We develop imputation--based estimators that use a small calibration sample of fully observed infectious periods, derive closed--form expressions for the pairwise exposure terms they require, and use a studentized parametric bootstrap for bias correction and uncertainty quantification. In simulations, removal time--only methods performed poorly in moderate to large $R_0$ scenarios, while observing even tens of complete infectious periods substantially improved the estimation of the infection rate. A reanalysis of the 1861 Hagelloch measles outbreak under simulated missingness recovered stable qualitative differences in transmission between school classes. Based on our results, we advocate for the targeted collection of a modest number of complete infectious periods as a means of improving surveillance in the early stages of an epidemic.

Pair-based estimators of infection and removal rates for stochastic epidemic models

Abstract

Stochastic epidemic models can estimate infection and removal rates, and derived quantities such as the basic reproductive number (

), when both infection and removal times are observed. In practice, however, removal times are often available while infection times are not, and existing methods that rely only on removal times can become unstable or biased. We study inference for stochastic SIR/SEIR models in a partial--observation setting. We develop imputation--based estimators that use a small calibration sample of fully observed infectious periods, derive closed--form expressions for the pairwise exposure terms they require, and use a studentized parametric bootstrap for bias correction and uncertainty quantification. In simulations, removal time--only methods performed poorly in moderate to large

scenarios, while observing even tens of complete infectious periods substantially improved the estimation of the infection rate. A reanalysis of the 1861 Hagelloch measles outbreak under simulated missingness recovered stable qualitative differences in transmission between school classes. Based on our results, we advocate for the targeted collection of a modest number of complete infectious periods as a means of improving surveillance in the early stages of an epidemic.

Paper Structure (30 sections, 8 theorems, 77 equations, 23 figures, 3 tables, 4 algorithms)

This paper contains 30 sections, 8 theorems, 77 equations, 23 figures, 3 tables, 4 algorithms.

Introduction
Preliminary materials
Simulating a pair-based epidemic model
The complete likelihood
The removal--time only likelihood
Pair-based likelihood approximations
Markov Chain Monte Carlo samplers
Imputation-based estimators of infection and removal rates
Complete--data maximum likelihood estimation
Expectation--maximization estimators
A baseline endpoint--imputation estimator.
A $\tau_{kj}$--imputation estimator.
A pathology under removal time--only observation
Uncertainty Quantification
Simulation study
...and 15 more sections

Key Result

lemma 1

If $i_k$ and $i_j$ are observed and $r_k$ is missing, then

Figures (23)

Figure 1: Diagram of the stochastic SIR model. Infectious individuals $\mathcal{I}(t)$ (orange) attempt to infect susceptible individuals $\mathcal{S}(t)$ (yellow) at time $t$. The pair-specific rates $\beta_{kj}$ and $\beta_{kl}$ from infected individual $k$ to susceptible individuals $j$ and $l$, depicted as orange directed arrows. The three removed individuals $\mathcal{R}(t)$ (green) cannot be reinfected, depicted as grey arrows.
Figure 2: SIR estimators that use some versus zero fully observed infectious periods. Box plots show the 2.5th, 25th, 50th, 75th, and 97.5th percentiles of A) infection rate $\beta$ and B) basic reproductive number $R_0$ estimates (y-axis) from different methods (x-axis). The uncorrected EM method $\tilde{\beta}$, the midpoint of a studentized bootstrap interval, and the Bayesian posterior mean came from simulations with $p_1=0.4$ and $p_2=0.8$. The unconditional and conditional PBLA estimates came from simulations with $p_1=0$ and $p_1=0.5$, respectively. The fixed incubation period was 0.
Figure 3: Infection rate estimates in SEIR epidemic models. Box plots show the 2.5th, 25th, 50th, 75th, and 97.5th percentiles of infection rate estimates from A-C) the studentized bootstrap midpoints and D-F) the Bayesian posterior means. The posterior means were calculated from 400 samples after a burn-in period of 100 samples. The default settings were that the expected proportion $p_1$ of completely observed infectious periods was 0.40, the expected proportion $p_2$ of missing infection times was 0.80, and the fixed incubation period $\delta$ was 0. Along the columns, we varied one of these parameters (legend) and held the other two parameters fixed. The susceptible population size $N$ was 100, the minimum epidemic size $n$ was 20, the removal rate $\gamma$ was 1, and the Erlang shape parameter $m$ was 1.
Figure 4: Group-specific reproductive number estimates of the 1861 measles epidemic in Hagelloch, Germany. Error plots show the point estimates and 95% confidence intervals of basic reproductive number estimates (y-axis) for each group (x-axis) from the A) studentized bootstrap or B) Bayesian methods. The y-axis is on the $\log_2$ scale. We varied the expected proportion $p_1$ of fully observed infectious periods (legend). The expected proportion $p_2$ of missing infection times was 0.8. The fixed incubation period was 10 days.
Figure S1: Correlations between estimates and prevalences for different methods. A) The scatter plot shows the Pearson and Spearman (legend) correlations (y-axis) between $\beta$ and $\gamma$ estimates for the different methods (x-axis). B) The scatter plot shows the Pearson correlations between $\beta$ estimates and the prevalences $n/N$ for the bootstrap and Bayesian methods (legend). C-D) The scatter plot shows the Pearson correlations between $\beta$ and $\gamma$ estimates for the bootstrap and Bayesian methods (subplot titles) as we varied the expected proportion $p_1$ of fully observed infectious periods. All settings were as in Figure \ref{['fig:main']}.
...and 18 more figures

Theorems & Definitions (17)

proof
lemma 1
proof
lemma 2
proof
lemma 3
proof
lemma 4
proof
lemma 5
...and 7 more

Pair-based estimators of infection and removal rates for stochastic epidemic models

Abstract

Pair-based estimators of infection and removal rates for stochastic epidemic models

Authors

Abstract

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (17)