Here Be Dragons: Bimodal posteriors arise from numerical integration error in longitudinal models
Tess O'Brien, Matthew T. Moores, David Warton, Daniel Falster
TL;DR
This paper reveals that numerical integration error in longitudinal differential-equation models can create a previously undescribed form of non-identifiability, manifesting as bimodal posteriors in Bayesian parameter estimation. The authors prove, for linear first-order ODEs under RK4 integration, that multiple $eta_1$ values can produce indistinguishable trajectories when $eta_0=\alpha\beta_1$, formalized by the root condition $0 = h\hat{\beta}_1 - \frac{h^2}{2}\hat{\beta}_1^2 + \frac{h^3}{6}\hat{\beta}_1^3 - \frac{h^4}{24}\hat{\beta}_1^4 + e^{-\beta_1 h} - 1$, which yields real roots near the true and an erroneous alternative (e.g., approximately 1.0008 and 4.9112 for $\beta_0=10,\beta_1=1$). They corroborate this with empirical simulations across RK4 step sizes, showing two robust posterior modes: one near the true parameter values and another driven by numerical artefacts; switching to an analytic solution eliminates the pathology, while RK45 can mitigate under certain priors. The paper thus provides a practical simulation workflow to detect numerically induced bimodality and warns that numerical integration choices should be treated as part of the statistical model in Bayesian inverse problems. Overall, the work highlights a fundamental interaction between numerical methods and identifiability, with direct implications for longitudinal Bayesian inference and the design of robust estimation workflows.
Abstract
Longitudinal models with dynamics governed by differential equations may require numerical integration alongside parameter estimation. We have identified a situation where the numerical integration introduces error in such a way that it becomes a novel source of non-uniqueness in estimation. We obtain two very different sets of parameters, one of which is a good estimate of the true values and the other a very poor one. The two estimates have forward numerical projections statistically indistinguishable from each other because of numerical error. In such cases, the posterior distribution for parameters is bimodal, with a dominant mode closer to the true parameter value, and a second cluster around the errant value. We demonstrate that multi-modality exists both theoretically and empirically for an affine first order differential equation, that a simulation workflow can test for evidence of the issue more generally, and that Markov Chain Monte Carlo sampling with a suitable solution can avoid bimodality. The issue of multi-modal posteriors arising from numerical error has consequences for Bayesian inverse methods that rely on numerical integration more broadly.
