Table of Contents
Fetching ...

Extrapolating molecular dynamics simulations to zero time step and across thermodynamic space

Kush Coshic, Gerhard Hummer

Abstract

The integration time step is a critical determinant of performance in molecular dynamics simulations, governing the trade-off between speed and fidelity. Although 2 fs remains the standard in atomistic biomolecular simulations, the push for performance has popularized a 4 fs time step with hydrogen mass repartitioning, often combined with multiple time stepping or mass rescaling. However, it is often unclear whether a chosen protocol is overly aggressive, as the apparent numerical stability of a trajectory can mask underlying thermodynamic inaccuracies. Increasing the time step will exacerbate systematic discretization errors, inherent to all numerical integration algorithms. In the widely used Verlet family of integrators, these errors manifest as $\mathcal{O}(Δt^2)$ deviations in thermodynamic observables such as potential energy and volume, and for common Langevin splitting schemes, even temperature. We demonstrate that these deviations follow a simple, linear thermodynamic model, allowing for their rigorous removal by extrapolation to the zero time step limit. In turn, the time-step dependence provides us with estimates of the system heat capacity, compressibility, and thermal expansion coefficient. This framework allows us to construct consistent probability distributions of energy and volume across thermodynamic states, effectively recovering Boltzmann-consistent statistics at a target condition independent of time step. These considerations are particularly important for enhanced sampling methods such as replica exchange and umbrella sampling, which rely on rigorous Boltzmann sampling and require accurate energies and temperatures for valid replica exchange probabilities and statistical reweighting.

Extrapolating molecular dynamics simulations to zero time step and across thermodynamic space

Abstract

The integration time step is a critical determinant of performance in molecular dynamics simulations, governing the trade-off between speed and fidelity. Although 2 fs remains the standard in atomistic biomolecular simulations, the push for performance has popularized a 4 fs time step with hydrogen mass repartitioning, often combined with multiple time stepping or mass rescaling. However, it is often unclear whether a chosen protocol is overly aggressive, as the apparent numerical stability of a trajectory can mask underlying thermodynamic inaccuracies. Increasing the time step will exacerbate systematic discretization errors, inherent to all numerical integration algorithms. In the widely used Verlet family of integrators, these errors manifest as deviations in thermodynamic observables such as potential energy and volume, and for common Langevin splitting schemes, even temperature. We demonstrate that these deviations follow a simple, linear thermodynamic model, allowing for their rigorous removal by extrapolation to the zero time step limit. In turn, the time-step dependence provides us with estimates of the system heat capacity, compressibility, and thermal expansion coefficient. This framework allows us to construct consistent probability distributions of energy and volume across thermodynamic states, effectively recovering Boltzmann-consistent statistics at a target condition independent of time step. These considerations are particularly important for enhanced sampling methods such as replica exchange and umbrella sampling, which rely on rigorous Boltzmann sampling and require accurate energies and temperatures for valid replica exchange probabilities and statistical reweighting.
Paper Structure (14 sections, 20 equations, 5 figures)

This paper contains 14 sections, 20 equations, 5 figures.

Figures (5)

  • Figure 1: Thermodynamic properties depend on the time step. Probability density distributions of (A) temperature, (B) system volume, (C) configurational enthalpy, and (D) total energy obtained from simulations of 5,184 TIP3P water molecules, with varying integration time steps ($\mathrm{\Delta t}$) as per legend. Results in (B-D) are per water molecule. Simulations were performed for 1 $\mathrm{\mu}$s at a set temperature, $T_\mathrm{set}=310$ K and pressure, $P_\mathrm{set}=1$ bar. For smoothing, the normalized distributions are presented as Gaussian kernel density estimates.
  • Figure 2: Time step dependence of temperature, volume, and energy. (A) Simulation averaged temperature, (B) volume per water molecule, and (C) total energy (T.E.) per water molecule, respectively, as functions $\Delta t^2$. Circles represent time-averaged values from individual simulations performed at different combinations of {$\Delta t$, $T_\mathrm{set}$, $P_\mathrm{set}$}. Symbol color represents $T_\mathrm{set}$, defined in the color bar. The cross symbols ($\times$) indicate corresponding predictions from the fitted thermodynamic model evaluated at the simulated conditions. Dashed lines represent the continuous model predictions for each unique combination of {$T_\mathrm{set}$, $P_\mathrm{set}$} across the full range of $\Delta t^2$ values. (D) Fitted model parameters calculated per water. Uncertainties correspond to standard errors derived from the covariance matrix of the weighted global least-squares fit.
  • Figure 3: Thermodynamic model collapses disparate distributions of enthalpy. Probability density of the configurational enthalpy per water molecule, for the system of $N=5184$ TIP3P water molecules, plotted on a semi-logarithmic scale. Faint dashed lines with open markers show the raw distributions from simulations performed at various time steps ($\Delta t = 0.1$--$4.0$ fs), temperatures ($T_\mathrm{set} = 310$--$318$ K), and pressures ($p_\mathrm{set} = 1$--$50$ bar). Solid markers with black outlines represent the same data after correction to a common reference state ($\Delta t \to 0$, $T_{\mathrm{ref}}=310$ K, $p_{\mathrm{ref}}=1$ bar) using the fitted thermodynamic model. The gray shaded area represents a Gaussian distribution centered at the theoretical zero time step mean enthalpy, $H_0 = U_0 + p V_0$, with a variance determined by the fitted heat capacity, $\sigma_{H_\mathrm{conf}}^2 = k_B T_{\mathrm{ref}}^2 C_{p_\mathrm{conf}} / N$, where $C_{p_\mathrm{conf}} = C_p -3R$.
  • Figure 4: Time step dependence of the thermodynamic model for a protein in solution. (A) Ubiquitin molecule (PDB ID: 1UBQ) in a water box. (B) Simulation averaged temperature, (C) volume per unit mass (Å$^3$/amu), and (D) total energy (T.E.) per unit mass (J/g), respectively, as functions of $\Delta t^2$. Symbols, colors, and dashed lines carry same meaning as in Fig. \ref{['fig:fig2']}A-C. (E) Fitted model parameters. Uncertainties correspond to standard errors derived from the covariance matrix of the weighted global least-squares fit.
  • Figure 5: Thermodynamic model collapses disparate distributions of enthalpy for protein in water. Probability density distributions of the configurational enthalpy per unit mass for the protein system, plotted on a semi-logarithmic scale. Symbols, lines, and the shaded region carry same meaning as defined in Fig. \ref{['fig:fig3']}. Marker shape and color distinguish different systems, as per legend.