Table of Contents
Fetching ...

Comparing Bayesian and Frequentist Inference in Biological Models: A Comparative Analysis of Accuracy, Uncertainty, and Identifiability

Mohammed A. Y. Mohammed, Hamed Karami, Gerardo Chowell

TL;DR

The study systematically compares Bayesian (BayesianFitForecast, Stan/HMC) and Frequentist (QuantDiffForecast, nonlinear least squares with parametric bootstrap) inference for three ODE-based biological models under a common normal-error framework. By analyzing LV, GLM, and SEIUR models across four datasets, and by conducting structural identifiability analyses, the authors show that full observability favors accurate point estimates and computational efficiency (Frequentist), while latent-state uncertainty and partial observability favor Bayesian methods with better uncertainty quantification and robust diagnostics. Structural identifiability explains these patterns: when parameters are identifiable under the data, both frameworks perform well; when identifiability is compromised, Bayesian regularization stabilizes inference and forecasting. The work provides practical guidance on framework selection conditioned on data richness, observability, and the relative importance of uncertainty calibration versus point prediction, emphasizing that identifiability analysis should precede estimation efforts.

Abstract

Mathematical models support inference and forecasting in ecology and epidemiology, but results depend on the estimation framework. We compare Bayesian and Frequentist approaches across three biological models using four datasets: Lotka-Volterra predator-prey dynamics (Hudson Bay), a generalized logistic model (lung injury and 2022 U.S. mpox), and an SEIUR epidemic model (COVID-19 in Spain). Both approaches use a normal error structure to ensure a fair comparison. We first assessed structural identifiability to determine which parameters can theoretically be recovered from the data. We then evaluated practical identifiability and forecasting performance using four metrics: mean absolute error (MAE), mean squared error (MSE), 95 percent prediction interval (PI) coverage, and weighted interval score (WIS). For the Lotka-Volterra model with both prey and predator data, we analyzed three scenarios: prey only, predator only, and both. The Frequentist workflow used QuantDiffForecast (QDF) in MATLAB, which fits ODE models via nonlinear least squares and quantifies uncertainty through parametric bootstrap. The Bayesian workflow used BayesianFitForecast (BFF), which employs Hamiltonian Monte Carlo sampling via Stan to generate posterior distributions and diagnostics such as the Gelman-Rubin R-hat statistic. Results show that Frequentist inference performs best when data are rich and fully observed, while Bayesian inference excels when latent-state uncertainty is high and data are sparse, as in the SEIUR COVID-19 model. Structural identifiability clarifies these patterns: full observability benefits both frameworks, while limited observability constrains parameter recovery. This comparison provides guidance for choosing inference frameworks based on data richness, observability, and uncertainty needs.

Comparing Bayesian and Frequentist Inference in Biological Models: A Comparative Analysis of Accuracy, Uncertainty, and Identifiability

TL;DR

The study systematically compares Bayesian (BayesianFitForecast, Stan/HMC) and Frequentist (QuantDiffForecast, nonlinear least squares with parametric bootstrap) inference for three ODE-based biological models under a common normal-error framework. By analyzing LV, GLM, and SEIUR models across four datasets, and by conducting structural identifiability analyses, the authors show that full observability favors accurate point estimates and computational efficiency (Frequentist), while latent-state uncertainty and partial observability favor Bayesian methods with better uncertainty quantification and robust diagnostics. Structural identifiability explains these patterns: when parameters are identifiable under the data, both frameworks perform well; when identifiability is compromised, Bayesian regularization stabilizes inference and forecasting. The work provides practical guidance on framework selection conditioned on data richness, observability, and the relative importance of uncertainty calibration versus point prediction, emphasizing that identifiability analysis should precede estimation efforts.

Abstract

Mathematical models support inference and forecasting in ecology and epidemiology, but results depend on the estimation framework. We compare Bayesian and Frequentist approaches across three biological models using four datasets: Lotka-Volterra predator-prey dynamics (Hudson Bay), a generalized logistic model (lung injury and 2022 U.S. mpox), and an SEIUR epidemic model (COVID-19 in Spain). Both approaches use a normal error structure to ensure a fair comparison. We first assessed structural identifiability to determine which parameters can theoretically be recovered from the data. We then evaluated practical identifiability and forecasting performance using four metrics: mean absolute error (MAE), mean squared error (MSE), 95 percent prediction interval (PI) coverage, and weighted interval score (WIS). For the Lotka-Volterra model with both prey and predator data, we analyzed three scenarios: prey only, predator only, and both. The Frequentist workflow used QuantDiffForecast (QDF) in MATLAB, which fits ODE models via nonlinear least squares and quantifies uncertainty through parametric bootstrap. The Bayesian workflow used BayesianFitForecast (BFF), which employs Hamiltonian Monte Carlo sampling via Stan to generate posterior distributions and diagnostics such as the Gelman-Rubin R-hat statistic. Results show that Frequentist inference performs best when data are rich and fully observed, while Bayesian inference excels when latent-state uncertainty is high and data are sparse, as in the SEIUR COVID-19 model. Structural identifiability clarifies these patterns: full observability benefits both frameworks, while limited observability constrains parameter recovery. This comparison provides guidance for choosing inference frameworks based on data richness, observability, and uncertainty needs.

Paper Structure

This paper contains 53 sections, 18 equations, 31 figures, 29 tables.

Figures (31)

  • Figure 1: Time series of ecological and epidemic population dynamics. The Hudson Bay Lynx--Hare dataset shows annual prey and predator abundance (1900--1920), while the remaining panels depict weekly or daily reported cases from major disease outbreaks: Lung Injury (EVALI, US, 2019), Mpox (US, 2022), and COVID-19 (Spain, 2020).
  • Figure 2: Lotka--Volterra predator--prey diagram. circles represent the prey $x$ and predator $y$ populations. The self-loop on $x$ indicates intrinsic growth at rate $\alpha$. Curved arrows between $x$ and $y$ represent interactions: prey loss due to predation and predator growth from consuming prey. The downward arrow on $y$ represents natural mortality.
  • Figure 3: Diagram of the GLM. The circle represents the cumulative cases $C(t)$. The self-loop indicates growth governed by the generalized logistic equation. The dashed arrow indicates the observed incidence.
  • Figure 4: Compartmental diagram of the SEIUR model with underreporting. Circles represent the epidemiological compartments. Solid arrows indicate transitions between compartments, and the dashed arrow indicates the source of observed cases.
  • Figure 5: Fitting visualization for the Lotka--Volterra model, with Hudson Bay lynx-hare data, observing both predator and prey, using QDF and BFF approaches. Top row: BFF. Bottom row: QDF. Left column: predator. Right column: prey.
  • ...and 26 more figures