Table of Contents
Fetching ...

Accurate Estimation of Diffusion Coefficients and their Uncertainties from Computer Simulation

Andrew R. McCluskey, Samuel W. Coles, Benjamin J. Morgan

TL;DR

This work introduces an approximate Bayesian regression framework to estimate the self-diffusion coefficient $D^*$ from a single molecular dynamics trajectory by modeling MSDs as a multivariate normal distribution with a covariance structure derived from freely diffusing particles. By parameterizing a model covariance $oldsymbol{ msigma}'$ from observed variances and independent observations, and performing MCMC sampling of compatible linear models, the method yields near-optimal estimates of $D^*$ and accurate uncertainty from single trajectories. Validation on both a 3D lattice random walk and the LLZO solid electrolyte demonstrates that the posterior distribution $p(D^*|m{x})$ closely matches the theoretically optimal distribution obtained with a converged covariance, while greatly improving statistical efficiency over OLS/WLS. The approach reduces computational cost and provides robust uncertainty quantification, enabling reliable comparisons across systems and conditions, with the analysis implemented in an open-source package kinisimccluskey_kinisi_2022.

Abstract

Self-diffusion coefficients, $D^*$, are routinely estimated from molecular dynamics simulations by fitting a linear model to the observed mean-squared displacements (MSDs) of mobile species. MSDs derived from simulation exhibit statistical noise that causes uncertainty in the resulting estimate of $D^*$. An optimal scheme for estimating $D^*$ minimises this uncertainty, i.e., it will have high statistical efficiency, and also gives an accurate estimate of the uncertainty itself. We present a scheme for estimating $\D$ from a single simulation trajectory with high statistical efficiency and accurately estimating the uncertainty in the predicted value. The statistical distribution of MSDs observable from a given simulation is modelled as a multivariate normal distribution using an analytical covariance matrix for an equivalent system of freely diffusing particles, which we parameterise from the available simulation data. We use Bayesian regression to sample the distribution of linear models that are compatible with this multivariate normal distribution, to obtain a statistically efficient estimate of $D^*$ and an accurate estimate of the associated statistical uncertainty.

Accurate Estimation of Diffusion Coefficients and their Uncertainties from Computer Simulation

TL;DR

This work introduces an approximate Bayesian regression framework to estimate the self-diffusion coefficient from a single molecular dynamics trajectory by modeling MSDs as a multivariate normal distribution with a covariance structure derived from freely diffusing particles. By parameterizing a model covariance from observed variances and independent observations, and performing MCMC sampling of compatible linear models, the method yields near-optimal estimates of and accurate uncertainty from single trajectories. Validation on both a 3D lattice random walk and the LLZO solid electrolyte demonstrates that the posterior distribution closely matches the theoretically optimal distribution obtained with a converged covariance, while greatly improving statistical efficiency over OLS/WLS. The approach reduces computational cost and provides robust uncertainty quantification, enabling reliable comparisons across systems and conditions, with the analysis implemented in an open-source package kinisimccluskey_kinisi_2022.

Abstract

Self-diffusion coefficients, , are routinely estimated from molecular dynamics simulations by fitting a linear model to the observed mean-squared displacements (MSDs) of mobile species. MSDs derived from simulation exhibit statistical noise that causes uncertainty in the resulting estimate of . An optimal scheme for estimating minimises this uncertainty, i.e., it will have high statistical efficiency, and also gives an accurate estimate of the uncertainty itself. We present a scheme for estimating from a single simulation trajectory with high statistical efficiency and accurately estimating the uncertainty in the predicted value. The statistical distribution of MSDs observable from a given simulation is modelled as a multivariate normal distribution using an analytical covariance matrix for an equivalent system of freely diffusing particles, which we parameterise from the available simulation data. We use Bayesian regression to sample the distribution of linear models that are compatible with this multivariate normal distribution, to obtain a statistically efficient estimate of and an accurate estimate of the associated statistical uncertainty.
Paper Structure (13 sections, 42 equations, 12 figures)

This paper contains 13 sections, 42 equations, 12 figures.

Figures (12)

  • Figure 1: Example distributions of estimated self-diffusion coefficients, $\widehat{D}^*$, calculated using (a) ordinary least squares (OLS), (b) weighted least squares (WLS), and (c) generalised least squares (GLS), from MSD data from 4096.0 individual simulations of 128.0 particles undergoing a 128step 3D lattice random walk, with a step size chosen so that the true diffusion coefficient $D^* = 1$. In each panel, the grey curve shows the best-fit normal distribution for the simulation data, the upper horizontal bar shows the standard deviation of this distribution, and the lower horizontal bar shows the average estimated standard deviation given by the analytical expression for $\sigma[p(\widehat{D}^*)]$ for each regression method.
  • Figure 2: Comparison of the numerical variance in observed MSD from multiple replica simulations and the estimated variance in observed MSD given by rescaling the variance in observed squared displacements (\ref{['equ:varestMSD']}). Panel (a) shows the mean observed MSD from 4096.0 simulations of 128.0 particles undergoing a 3D lattice random walk of 128.0 steps per particle, with error bars of $\pm2\sigma[x_i]$. Panel (b) shows the MSD from just one simulation, with error bars of $\pm2\widehat{\sigma}[x_i]$, obtained via \ref{['equ:varestMSD']}. Panel (c) plots the numerical variance against the estimated variance from a single simulation as a function of timestep $i$.
  • Figure 3: (a) The numerical MSD covariance matrix $\bm{\Sigma}$ calculated using MSD data from 4096.0 simulations of 128.0 particles undergoing a 3D lattice random walk of 128.0 steps per particle. (b) The analytical MSD covariance matrix $\bm{\Sigma^\prime}$ (\ref{['equ:cvv']}), parametrised using analytical long-time limit random-walk variances $\sigma^2[x_i]$. (c) The MSD covariance matrix obtained applying the numerical scheme described in the main text to each individual random walk simulation, averaged over all 4096.0 such simulations. Colour bars in (a--c) show the covariance, $\Sigma\left[x_i, x_j\right]$. The off-diagonal panels show difference plots, computed as per-element ratios between pairs of covariance matrices (a--c).
  • Figure 4: (a) Observed MSD from a single simulation of 128 particles undergoing a 3D-lattice random walk of 128 steps per particle (dark line). The green shading shows the corresponding posterior distribution $p(\bm{m}|\bm{x})$ of linear models compatible with the observed MSD data $\bm{x}$, calculated using the scheme described in the main text. The variegated shading indicates compatibility intervals of [list-final-separator = , and ]1;2;3σ$[p(\bm{m}|\bm{x})]$. (b) The marginal posterior distribution $p(\widehat{D}^*|\bm{x})$ obtained from the posterior distribution of linear models in (a). The mean of this distribution gives the point estimate $\widehat{D}^*$ for this simulation input data. The blue horizontal bar shows an interval of one standard deviation in $p(\widehat{D}^*|\bm{x})$. (c) Probability distribution of point-estimates $p(\widehat{D}^*)$ obtained from 4096.0 individual random-walk simulations. Each simulation has been analysed as in (a) and (b) to yield a single corresponding point estimate $\widehat{D}^*$. The grey line shows the distribution of point estimates, $p(\widehat{D}^*_\mathrm{num})$, obtained using Bayesian regression with a mean vector and numerical covariance matrix derived from the complete dataset of all 4096.0 simulations. The pink horizontal bar shows an interval of one standard deviation in $p(\widehat{D}^*)$. (d) Probability distribution of estimated variances, $\widehat{\sigma}^2[\widehat{D}^*]$, for individual random-walk simulations, compared to the true sample variance (pink vertical line) $\sigma^2[\widehat{D}^*]$.
  • Figure 5: (a) Probability distribution of point estimates $p(\widehat{D}^*)$ for 192.0 effective simulations of LLZO (orange histogram). The grey line shows the distribution $p(\widehat{D}^*_\mathrm{num})$ obtained using Bayesian regression with the complete LLZO dataset as input. The pink bar shows an interval of one standard deviation $\sigma[p(\widehat{D}^*)]$. (b) Probability distribution of estimated variances, $\widehat{\sigma}^2[\widehat{D}^*]$, for individual LLZO effective simulations, compared to the true sample variance (pink vertical line) $\sigma^2[\widehat{D}^*]$.
  • ...and 7 more figures