Accurate Estimation of Diffusion Coefficients and their Uncertainties from Computer Simulation
Andrew R. McCluskey, Samuel W. Coles, Benjamin J. Morgan
TL;DR
This work introduces an approximate Bayesian regression framework to estimate the self-diffusion coefficient $D^*$ from a single molecular dynamics trajectory by modeling MSDs as a multivariate normal distribution with a covariance structure derived from freely diffusing particles. By parameterizing a model covariance $oldsymbol{ msigma}'$ from observed variances and independent observations, and performing MCMC sampling of compatible linear models, the method yields near-optimal estimates of $D^*$ and accurate uncertainty from single trajectories. Validation on both a 3D lattice random walk and the LLZO solid electrolyte demonstrates that the posterior distribution $p(D^*|m{x})$ closely matches the theoretically optimal distribution obtained with a converged covariance, while greatly improving statistical efficiency over OLS/WLS. The approach reduces computational cost and provides robust uncertainty quantification, enabling reliable comparisons across systems and conditions, with the analysis implemented in an open-source package kinisimccluskey_kinisi_2022.
Abstract
Self-diffusion coefficients, $D^*$, are routinely estimated from molecular dynamics simulations by fitting a linear model to the observed mean-squared displacements (MSDs) of mobile species. MSDs derived from simulation exhibit statistical noise that causes uncertainty in the resulting estimate of $D^*$. An optimal scheme for estimating $D^*$ minimises this uncertainty, i.e., it will have high statistical efficiency, and also gives an accurate estimate of the uncertainty itself. We present a scheme for estimating $\D$ from a single simulation trajectory with high statistical efficiency and accurately estimating the uncertainty in the predicted value. The statistical distribution of MSDs observable from a given simulation is modelled as a multivariate normal distribution using an analytical covariance matrix for an equivalent system of freely diffusing particles, which we parameterise from the available simulation data. We use Bayesian regression to sample the distribution of linear models that are compatible with this multivariate normal distribution, to obtain a statistically efficient estimate of $D^*$ and an accurate estimate of the associated statistical uncertainty.
