Marginal Likelihood Inference for Fitting Dynamical Survival Analysis Models to Epidemic Count Data
Suchismita Roy, Alexander A. Fisher, Jason Xu
TL;DR
Partially observed stochastic epidemic data create intractable marginal likelihoods under standard CTMC formulations. The authors derive a closed-form marginal count likelihood within the Dynamical Survival Analysis (DSA) framework by replacing the population hazard with its large-population limit, yielding a count-based likelihood that depends only on the ODE solution at observation times and is independent of the population size $N$. The marginal likelihood $l(\\theta|\\{Y_j\\}) = s_T^{N-K} \prod_{j=1}^P (s_{\\xi_{j-1}} - s_{\\xi_j})^{Y_j}$ extends to frailty and network variants, enabling efficient Bayesian inference and scalable model generalization. Through simulations and real data on Ebola and COVID-19, the method demonstrates competitive parameter recovery with substantially reduced computational cost and increased modeling flexibility for partially observed epidemics.
Abstract
Stochastic compartmental models are prevalent tools for describing disease spread, but inference under these models is challenging for many types of surveillance data when the marginal likelihood function becomes intractable due to missing information. To address this, we develop a closed-form likelihood for discretely observed incidence count data under the dynamical survival analysis (DSA) paradigm. The method approximates the stochastic population-level hazard by a large population limit while retaining a count-valued stochastic model, and leads to survival analytic inferential strategies that are both computationally efficient and flexible to model generalizations. Through simulation, we show that parameter estimation is competitive with recent exact but computationally expensive likelihood-based methods in partially observed settings. Previous work has shown that the DSA approximation is generalizable, and we show that the inferential developments here also carry over to models featuring individual heterogeneity, such as frailty models. We consider case studies of both Ebola and COVID-19 data on variants of the model, including a network-based epidemic model and a model with distributions over susceptibility, demonstrating its flexibility and practical utility on real, partially observed datasets.
