Machine Learning of Slow Collective Variables and Enhanced Sampling via Spatial Techniques
Tuğçe Gökdemir, Jakub Rydzewski
TL;DR
The paper addresses the challenge of identifying slow collective variables (CVs) for long-timescale dynamics, where barriers of order $k_{\mathrm{B}}T$ hinder sampling. It surveys spatial, trajectory-free ML methods that exploit thermodynamic structure—diffusion maps with anisotropic kernels, reweighted transitions, eigen/decomposition approaches, reweighted stochastic embedding (RSE/MRSE), and spectral-map learning—along with neural-network–based enhanced sampling. It clarifies how these methods infer slow CVs from equilibrium properties, enable unbiased Markov state models, and integrate with enhanced sampling via reweighting and on-the-fly biasing. The work provides a roadmap for thermodynamics-informed CV learning as a complementary route to trajectory-based methods in molecular dynamics, with potential to improve interpretability and efficiency in exploring complex chemical systems.
Abstract
Understanding the long-time dynamics of complex physical processes depends on our ability to recognize patterns. To simplify the description of these processes, we often introduce a set of reaction coordinates, customarily referred to as collective variables (CVs). The quality of these CVs heavily impacts our comprehension of the dynamics, often influencing the estimates of thermodynamics and kinetics from atomistic simulations. Consequently, identifying CVs poses a fundamental challenge in chemical physics. Recently, significant progress was made by leveraging the predictive ability of unsupervised machine learning techniques to determine CVs. Many of these techniques require temporal information to learn slow CVs that correspond to the long timescale behavior of the studied process. Here, however, we specifically focus on techniques that can identify CVs corresponding to the slowest transitions between states without needing temporal trajectories as input, instead using the spatial characteristics of the data. We discuss the latest developments in this category of techniques and briefly discuss potential directions for thermodynamics-informed spatial learning of slow CVs.
