Table of Contents
Fetching ...

Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles

Jakub Rydzewski

TL;DR

This work advances spectral map, a data-driven method to learn slow collective variables (CVs) by maximizing a spectral gap of a Markov transition operator, thereby producing a memoryless diffusion description on a free-energy landscape. Applying the framework to FiP35 protein folding, the authors extract a (essentially) one-dimensional slow CV that captures the dominant folding/unfolding timescale and define a transition-state ensemble through kinetic partitions of the CV space. They show the learned CVs approach the Markovian limit for overdamped diffusion, find that coordinate-dependent diffusion only modestly perturbs the free-energy profile, and demonstrate the slow CV's ability to illuminate structurally meaningful regions and key residues driving slow dynamics. The results suggest that spectral map can yield physically interpretable reaction coordinates for complex molecular processes and offer a pathway to analyze feature importance and transitions, with future extensions to biased simulations via reweighting.

Abstract

Understanding the behavior of complex molecular systems is a fundamental problem in physical chemistry. To describe the long-time dynamics of such systems, which is responsible for their most informative characteristics, we can identify a few slow collective variables (CVs) while treating the remaining fast variables as thermal noise. This enables us to simplify the dynamics and treat it as diffusion in a free-energy landscape spanned by slow CVs, effectively rendering the dynamics Markovian. Our recent statistical learning technique, spectral map [Rydzewski, J. Phys. Chem. Lett. 2023, 14, 22, 5216-5220], explores this strategy to learn slow CVs by maximizing a spectral gap of a transition matrix. In this work, we introduce several advancements into our framework, using a high-dimensional reversible folding process of a protein as an example. We implement an algorithm for coarse-graining Markov transition matrices to partition the reduced space of slow CVs kinetically and use it to define a transition state ensemble. We show that slow CVs learned by spectral map closely approach the Markovian limit for an overdamped diffusion. We demonstrate that coordinate-dependent diffusion coefficients only slightly affect the constructed free-energy landscapes. Finally, we present how spectral map can be used to quantify the importance of features and compare slow CVs with structural descriptors commonly used in protein folding. Overall, we demonstrate that a single slow CV learned by spectral map can be used as a physical reaction coordinate to capture essential characteristics of protein folding.

Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles

TL;DR

This work advances spectral map, a data-driven method to learn slow collective variables (CVs) by maximizing a spectral gap of a Markov transition operator, thereby producing a memoryless diffusion description on a free-energy landscape. Applying the framework to FiP35 protein folding, the authors extract a (essentially) one-dimensional slow CV that captures the dominant folding/unfolding timescale and define a transition-state ensemble through kinetic partitions of the CV space. They show the learned CVs approach the Markovian limit for overdamped diffusion, find that coordinate-dependent diffusion only modestly perturbs the free-energy profile, and demonstrate the slow CV's ability to illuminate structurally meaningful regions and key residues driving slow dynamics. The results suggest that spectral map can yield physically interpretable reaction coordinates for complex molecular processes and offer a pathway to analyze feature importance and transitions, with future extensions to biased simulations via reweighting.

Abstract

Understanding the behavior of complex molecular systems is a fundamental problem in physical chemistry. To describe the long-time dynamics of such systems, which is responsible for their most informative characteristics, we can identify a few slow collective variables (CVs) while treating the remaining fast variables as thermal noise. This enables us to simplify the dynamics and treat it as diffusion in a free-energy landscape spanned by slow CVs, effectively rendering the dynamics Markovian. Our recent statistical learning technique, spectral map [Rydzewski, J. Phys. Chem. Lett. 2023, 14, 22, 5216-5220], explores this strategy to learn slow CVs by maximizing a spectral gap of a transition matrix. In this work, we introduce several advancements into our framework, using a high-dimensional reversible folding process of a protein as an example. We implement an algorithm for coarse-graining Markov transition matrices to partition the reduced space of slow CVs kinetically and use it to define a transition state ensemble. We show that slow CVs learned by spectral map closely approach the Markovian limit for an overdamped diffusion. We demonstrate that coordinate-dependent diffusion coefficients only slightly affect the constructed free-energy landscapes. Finally, we present how spectral map can be used to quantify the importance of features and compare slow CVs with structural descriptors commonly used in protein folding. Overall, we demonstrate that a single slow CV learned by spectral map can be used as a physical reaction coordinate to capture essential characteristics of protein folding.
Paper Structure (17 sections, 22 equations, 3 figures, 1 algorithm)

This paper contains 17 sections, 22 equations, 3 figures, 1 algorithm.

Figures (3)

  • Figure 1: Spectral map of FiP35. (a) Free-energy landscape with the folded and unfolded metastable states of FiP35 spanned by two CVs learned by spectral map. The contour lines are placed every 1 $k_{\mathrm{B}}T$. The projection with colored samples is shown below the free-energy landscape, where the folded (FS), transition (TS), and unfolded (US) states are shown in red, grey, and blue, respectively. The minimum free-energy path is shown by the black line linking the folded and unfolded states. (b) The minimum free-energy path [corresponding to the black line in (a)] shows the energy barrier between the metastable states of around 5 $k_{\mathrm{B}}T$. (c) Eigenspectrum of the Markov transition matrix at the end of the learning procedure showing a large spectral gap of $\sigma=0.89$ between the first and second eigenvalues, i.e., $\sigma=\lambda_1-\lambda_2$. The maximization of the spectral gap results in the degeneracy of the first eigenvalue $\lambda_1$ and the rest of the eigenvalues for close to 0 and thus negligible. The gray line with error bars shows the average and standard deviations of eigenspectra resulting from attempts to maximize the spectral gaps for $k>2$, showing the lack of any significant timescale separation.
  • Figure 2: Slow CV $z$ for the FiP35 folding learned by spectral map with kinetic partitioning performed. The folded, unfolded, and transition states are shown in red, blue, and white, respectively. (a) Trajectory $z(t)$ of 100 $\mu$s used for learning. (b) Free-energy profile along $z$ with two metastable states ($\sigma=0.87$) with a barrier of around 5 $k_{\mathrm{B}}T$. (c) Coordinate-dependent diffusion coefficients $D(z)/D_0$, where $D_0$ is the diffusion coefficient for the folded state. (d) Probability $p(\textsc{ts} \left.\right\vert z)$ in the transition state ts with a maximum $p^*=0.22$ that is very close to the Markovian limit of 0.25 for the dynamics in the overdamped regime.
  • Figure 3: (a) Free-energy landscape $F(\mathbf{z})$ as a function of $\mathbf{z}=(z,q)$, where $z$ is the learned slow CV (the corresponding spectral gap $\sigma=0.87$) and $q$ is the fraction of native contacts ($\sigma=0.72$). (b) FiP35 residues important for its transitions on the longer timescales shown on a randomly selected conformation from the folded state (shown in blue). The three strands of FiP35 form the $\beta_1$ and $\beta_2$ sheets are depicted. FiP35 residues are colored according to their relative importance calculated using spectral gaps of pairwise Euclidean distances: $r_k = \sum_l \sigma_{kl}$, where $\sigma_{kl}$ denotes the spectral gap of the distance between residues $k$ and $l$. (c) Crystallographic structure of FiP35 shown for comparison jager2006structure.