Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations
Jakub Rydzewski, Ming Chen, Tushar K. Ghosh, Omar Valsson
TL;DR
The paper tackles the challenge of learning slow collective variables (CVs) from high-dimensional atomistic data when enhanced sampling biases the distribution. It introduces diffusion reweighting, a pairwise reweighting of Markov transitions that accounts for bias via density and weight factors, enabling CV learning with diffusion maps and stochastic embeddings while recovering the equilibrium density $P(\mathbf{z})$ from biased data. Demonstrations on a simple model potential, alanine dipeptide, and chignolin show that reweighted embeddings recover metastable-state structure and produce free-energy landscapes consistent with unbiased references. This framework broadens the applicability of manifold learning to biased simulations and supports direct CV construction from enhanced-sampling data, potentially improving biasing schemes and interpretation of metastable dynamics.
Abstract
Enhanced sampling methods are indispensable in computational physics and chemistry, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CVs. Selecting CVs to analyze and drive the sampling is not trivial and often relies on physical and chemical intuition. Despite routinely circumventing this issue using manifold learning to estimate CVs directly from standard simulations, such methods cannot provide mappings to a low-dimensional manifold from enhanced sampling simulations as the geometry and density of the learned manifold are biased. Here, we address this crucial issue and provide a general reweighting framework based on anisotropic diffusion maps for manifold learning that takes into account that the learning data set is sampled from a biased probability distribution. We consider manifold learning methods based on constructing a Markov chain describing transition probabilities between high-dimensional samples. We show that our framework reverts the biasing effect yielding CVs that correctly describe the equilibrium density. This advancement enables the construction of low-dimensional CVs using manifold learning directly from data generated by enhanced sampling simulations. We call our framework reweighted manifold learning. We show that it can be used in many manifold learning techniques on data from both standard and enhanced sampling simulations.
