Table of Contents
Fetching ...

Lost in Projection? Gaussian Filtering Recovers Hidden Conformational States

Sofia Sartore, Daniel Nagel, Georg Diez, Gerhard Stock

Abstract

To interpret molecular dynamics (MD) simulations, it is common practice to reduce the dimensionality of the molecular coordinates to a low-dimensional collective variable $x$. Projecting the high-dimensional MD data onto $x$ yields a free energy landscape $ΔG(x)$, which highlights low-energy regions corresponding to conformational states. The accurate definition of these states, however, is often impeded by projection artifacts, resulting in artificially shortened state lifetimes or even the complete disappearance of states from the analysis. As demonstrated for a two-dimensional toy model, Gaussian low-pass filtering of the high-dimensional MD coordinates can restore the underlying free energy landscape, allowing to recover previously hidden states. When applied to an all-atom folding trajectory of HP35, the number of microstates increases by an order of magnitude, which leads to metastable states that are long-lived and much better defined structurally, even compared to dynamically cored state trajectories.

Lost in Projection? Gaussian Filtering Recovers Hidden Conformational States

Abstract

To interpret molecular dynamics (MD) simulations, it is common practice to reduce the dimensionality of the molecular coordinates to a low-dimensional collective variable . Projecting the high-dimensional MD data onto yields a free energy landscape , which highlights low-energy regions corresponding to conformational states. The accurate definition of these states, however, is often impeded by projection artifacts, resulting in artificially shortened state lifetimes or even the complete disappearance of states from the analysis. As demonstrated for a two-dimensional toy model, Gaussian low-pass filtering of the high-dimensional MD coordinates can restore the underlying free energy landscape, allowing to recover previously hidden states. When applied to an all-atom folding trajectory of HP35, the number of microstates increases by an order of magnitude, which leads to metastable states that are long-lived and much better defined structurally, even compared to dynamically cored state trajectories.
Paper Structure (2 sections, 3 equations, 6 figures)

This paper contains 2 sections, 3 equations, 6 figures.

Figures (6)

  • Figure 1: Three-well model. (a) 2D free energy landscape $\Delta G(x, y)$ as a function of the original coordinates $x$ and $y$. Indicated are the optimal 1D coordinate, $s$, directly connecting the three minima, and a suboptimal coordinate, $r$, obtained via the projection on vector $\hbox{\boldmath $r$}$. (b) Projection of the 2D data on the optimal reaction coordinate $s$ yields the barrier-preserving free energy curve $\Delta G(s)$. (c) Trajectory of the optimal reaction coordinate, color coded by the state that each frame is assigned to.
  • Figure 2: Effects of suboptimal 1D reaction coordinates $x$ (top) and $r$ (bottom) chosen for the 2D model. Shown are (left) 1D energy curves and (right) color-coded time traces with states assigned by cutting at the barrier. Panels (b, f) shows the raw data, (c, g) the results after applying iterative coring ($t_{\text{cor}}=10\,$frames), and (d, h) the results after applying Gaussian filtering ($t_{\text{GF}}=10\,$frames). The right side of each panel shows the respective state-resolved distributions.
  • Figure 3: Implied timescales (ITSs) of the toy model, shown as a function of the lag time $\tau_{\text{lag}}$. (a) Projecting on coordinate $x$, we show uncorrected data and results from iterative coring ($t_{\text{cor}}=10\,$frames, green) and Gaussian filtering ($t_{\text{GF}}=10\,$frames, yellow). (b) Projecting on coordinate $r$, we use Gaussian filtering ($t_{\text{GF}}=4\,, \,10 \,,\,20\,$frames) to recover both ITSs. In all cases we compare to the reference timescales obtained for the optimal coordinate $s$ (dashed black lines).
  • Figure 4: The folding of HP35. (a) Structure of the native state, and (b) illustration of the structurally most important 27 native contacts, ordered in seven MoSAIC clusters. diez_correlation-based_2022 (c) Time evolution of the fraction of native contacts $Q$ obtained from the folding trajectory by Piana et al.piana_protein_2012. Adapted from Ref. nagel_toward_2023.
  • Figure 5: (a) Effects of Gaussian filtering with various filtering windows $t_{\text{GF}}$ on the observed number of folding events $N_{\rm fold}$ of HP35 (green) and on the resulting number of microstates $N_{\rm states}$ obtained from robust density-based clusteringsittel_robust_2016 (black). (b) First three implied timescales of MSMs obtained from coordinates filtered with different windows (full lines), and from dynamical coring on the microstates, using different coring times (dashed lines).
  • ...and 1 more figures