Machine Learning of Slow Collective Variables and Enhanced Sampling via Spatial Techniques

Tuğçe Gökdemir; Jakub Rydzewski

Machine Learning of Slow Collective Variables and Enhanced Sampling via Spatial Techniques

Tuğçe Gökdemir, Jakub Rydzewski

TL;DR

The paper addresses the challenge of identifying slow collective variables (CVs) for long-timescale dynamics, where barriers of order $k_{\mathrm{B}}T$ hinder sampling. It surveys spatial, trajectory-free ML methods that exploit thermodynamic structure—diffusion maps with anisotropic kernels, reweighted transitions, eigen/decomposition approaches, reweighted stochastic embedding (RSE/MRSE), and spectral-map learning—along with neural-network–based enhanced sampling. It clarifies how these methods infer slow CVs from equilibrium properties, enable unbiased Markov state models, and integrate with enhanced sampling via reweighting and on-the-fly biasing. The work provides a roadmap for thermodynamics-informed CV learning as a complementary route to trajectory-based methods in molecular dynamics, with potential to improve interpretability and efficiency in exploring complex chemical systems.

Abstract

Understanding the long-time dynamics of complex physical processes depends on our ability to recognize patterns. To simplify the description of these processes, we often introduce a set of reaction coordinates, customarily referred to as collective variables (CVs). The quality of these CVs heavily impacts our comprehension of the dynamics, often influencing the estimates of thermodynamics and kinetics from atomistic simulations. Consequently, identifying CVs poses a fundamental challenge in chemical physics. Recently, significant progress was made by leveraging the predictive ability of unsupervised machine learning techniques to determine CVs. Many of these techniques require temporal information to learn slow CVs that correspond to the long timescale behavior of the studied process. Here, however, we specifically focus on techniques that can identify CVs corresponding to the slowest transitions between states without needing temporal trajectories as input, instead using the spatial characteristics of the data. We discuss the latest developments in this category of techniques and briefly discuss potential directions for thermodynamics-informed spatial learning of slow CVs.

Machine Learning of Slow Collective Variables and Enhanced Sampling via Spatial Techniques

TL;DR

The paper addresses the challenge of identifying slow collective variables (CVs) for long-timescale dynamics, where barriers of order

hinder sampling. It surveys spatial, trajectory-free ML methods that exploit thermodynamic structure—diffusion maps with anisotropic kernels, reweighted transitions, eigen/decomposition approaches, reweighted stochastic embedding (RSE/MRSE), and spectral-map learning—along with neural-network–based enhanced sampling. It clarifies how these methods infer slow CVs from equilibrium properties, enable unbiased Markov state models, and integrate with enhanced sampling via reweighting and on-the-fly biasing. The work provides a roadmap for thermodynamics-informed CV learning as a complementary route to trajectory-based methods in molecular dynamics, with potential to improve interpretability and efficiency in exploring complex chemical systems.

Abstract

Paper Structure (13 sections, 26 equations, 3 figures)

This paper contains 13 sections, 26 equations, 3 figures.

Introduction
Background
Collective Variables
Timescale Separation
Enhanced Sampling
Spatial Learning
Anisotropic Kernels
Reweighted Transitions
Eigendecomposition
Reweighted Stochastic Embedding
Spectral Map
Enhanced Sampling via Neural Networks
Summary

Figures (3)

Figure 1: Model potential with two metastable states whose long-time behavior can effectively be described by the slow variable $x_{\mathrm{s}}$, with the fast variable $x_{\mathrm{f}}$ responsible only for fluctuations within the states. The corresponding eigenspectrum of the diffusion generator $\lambda_k=\operatorname{e}^{-\mu_k}$ shows timescale separation, which is indicated by the spectral gap $\lambda_{k-1} - \lambda_k$, where $k=2$ is the number of states.
Figure 2: Learning CVs with spatial techniques. Diagram of a neural network showing the difference between reweighted stochastic embedding (RSE) and spectral map. RSE estimates transition matrices $M(\mathbf{x}_k,\mathbf{x}_l)$ and $Q(\mathbf{z}_k,\mathbf{z}_l)$ in both $\mathbf{x}$ and $\mathbf{z}$ spaces, respectively (as $\mathbf{x}$ space can consist of variables different than the microscopic coordinates, we denote it as features). Then, it uses the Kullback--Leibler (KL) divergence as a loss function to minimize differences between pairs of transition probabilities in $\mathbf{x}$ and $\mathbf{z}$ spaces. In contrast, spectral map constructs a transition matrix only in $\mathbf{z}$ space. Next, it performs an eigendecomposition of $Q$ to calculate the spectral gap between neighboring eigenvalues ($\Delta\lambda_{m-1,m}$ where $m$ is the number of states in $\mathbf{z}$ space) and maximizes it to improve timescale separation between slow and fast variables.
Figure 3: Free energy landscape of the FiP35 protein constructed from slow CVs learned with spectral map (right). The slow CVs discriminate between the folded state (FS) and the unfolded state (US), which are separated by the transition state (TS) near the free energy barrier. The most important physical interactions in the FiP35 consisting of two $\beta$ sheets identified by spectral map are shown in blue (left). [Figure based on Rydzewski, "Spectral Map for Slow Collective Variables, Markovian Dynamics, and Transition State Ensembles," J. Chem. Theory Comput. (2024). Copyright 2024 Author, licensed under Creative Commons Attribution 4.0.]

Machine Learning of Slow Collective Variables and Enhanced Sampling via Spatial Techniques

TL;DR

Abstract

Machine Learning of Slow Collective Variables and Enhanced Sampling via Spatial Techniques

Authors

TL;DR

Abstract

Table of Contents

Figures (3)