Table of Contents
Fetching ...

Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations

Jakub Rydzewski, Ming Chen, Tushar K. Ghosh, Omar Valsson

TL;DR

The paper tackles the challenge of learning slow collective variables (CVs) from high-dimensional atomistic data when enhanced sampling biases the distribution. It introduces diffusion reweighting, a pairwise reweighting of Markov transitions that accounts for bias via density and weight factors, enabling CV learning with diffusion maps and stochastic embeddings while recovering the equilibrium density $P(\mathbf{z})$ from biased data. Demonstrations on a simple model potential, alanine dipeptide, and chignolin show that reweighted embeddings recover metastable-state structure and produce free-energy landscapes consistent with unbiased references. This framework broadens the applicability of manifold learning to biased simulations and supports direct CV construction from enhanced-sampling data, potentially improving biasing schemes and interpretation of metastable dynamics.

Abstract

Enhanced sampling methods are indispensable in computational physics and chemistry, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CVs. Selecting CVs to analyze and drive the sampling is not trivial and often relies on physical and chemical intuition. Despite routinely circumventing this issue using manifold learning to estimate CVs directly from standard simulations, such methods cannot provide mappings to a low-dimensional manifold from enhanced sampling simulations as the geometry and density of the learned manifold are biased. Here, we address this crucial issue and provide a general reweighting framework based on anisotropic diffusion maps for manifold learning that takes into account that the learning data set is sampled from a biased probability distribution. We consider manifold learning methods based on constructing a Markov chain describing transition probabilities between high-dimensional samples. We show that our framework reverts the biasing effect yielding CVs that correctly describe the equilibrium density. This advancement enables the construction of low-dimensional CVs using manifold learning directly from data generated by enhanced sampling simulations. We call our framework reweighted manifold learning. We show that it can be used in many manifold learning techniques on data from both standard and enhanced sampling simulations.

Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations

TL;DR

The paper tackles the challenge of learning slow collective variables (CVs) from high-dimensional atomistic data when enhanced sampling biases the distribution. It introduces diffusion reweighting, a pairwise reweighting of Markov transitions that accounts for bias via density and weight factors, enabling CV learning with diffusion maps and stochastic embeddings while recovering the equilibrium density from biased data. Demonstrations on a simple model potential, alanine dipeptide, and chignolin show that reweighted embeddings recover metastable-state structure and produce free-energy landscapes consistent with unbiased references. This framework broadens the applicability of manifold learning to biased simulations and supports direct CV construction from enhanced-sampling data, potentially improving biasing schemes and interpretation of metastable dynamics.

Abstract

Enhanced sampling methods are indispensable in computational physics and chemistry, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CVs. Selecting CVs to analyze and drive the sampling is not trivial and often relies on physical and chemical intuition. Despite routinely circumventing this issue using manifold learning to estimate CVs directly from standard simulations, such methods cannot provide mappings to a low-dimensional manifold from enhanced sampling simulations as the geometry and density of the learned manifold are biased. Here, we address this crucial issue and provide a general reweighting framework based on anisotropic diffusion maps for manifold learning that takes into account that the learning data set is sampled from a biased probability distribution. We consider manifold learning methods based on constructing a Markov chain describing transition probabilities between high-dimensional samples. We show that our framework reverts the biasing effect yielding CVs that correctly describe the equilibrium density. This advancement enables the construction of low-dimensional CVs using manifold learning directly from data generated by enhanced sampling simulations. We call our framework reweighted manifold learning. We show that it can be used in many manifold learning techniques on data from both standard and enhanced sampling simulations.
Paper Structure (25 sections, 50 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 25 sections, 50 equations, 4 figures, 1 table, 2 algorithms.

Figures (4)

  • Figure 1: Target mapping from high-dimensional samples of configuration variables $\mathbf{x}$ to a low-dimensional manifold spanned by CVs $\mathbf{z}$. In our framework, learning CVs is equivalent to finding the optimal parametrization of the target mapping $\mathbf{z}=\xi(\mathbf{x})$ [Eq. (\ref{['eq:xtoz']})]. The target mapping performs the reduction from $\mathbb{R}^n$ to $\mathbb{R}^d$ so the relation $p_{kl}$ between the high-dimensional samples $\mathbf{x}_k$ and $\mathbf{x}_l$ is preserved in the relation $q_{kl}$ in a low-dimensional manifold between the CV samples $\mathbf{z}_k$ and $\mathbf{z}_l$. For a detailed discussion, see Secs. \ref{['sec:diffrew']} and \ref{['sec:rse']}.
  • Figure 2: Diffusion maps generated for the reweighted and non-reweighted (without applying diffusion reweighting) biased simulation of a particle in a simple ($a$) one-dimensional potential $U(x)$ where the energy barriers separating the deepest minimum are on the order of 50 $k_{\mathrm{B}}T$, and the corresponding transitions from this state are rare events. ($b$) A comparison between the non-reweighted (blue) and reweighted (red) diffusion maps: the equilibrium densities along the coordinate $x$ and diffusion coordinates $\lambda_0\psi_0$ vs. $\lambda_1\psi_1$, with coloring according to the $x$ value. The enhanced sampling simulation is performed using well-tempered metadynamics barducci2008well with a bias factor of 10 by employing the pesmd code in the plumedplumedplumed-nest plugin.
  • Figure 3: Reweighted diffusion maps on a peptide model system (Ace-Ala-Nme) in vacuum at 300 K simulated using well-tempered metadynamics barducci2008well enhancing the $\Phi$ and $\Psi$ dihedral angles and a bias factor $\gamma=5$. The diffusion map is calculated using a high-dimensional space of 45 pairwise distances between heavy atoms. ($a$) A representative structure of alanine dipeptide with the dihedral angles $\Phi$ and $\Psi$. ($b$) A spectrum of eigenvalues $\{\lambda_l\}$ obtained from the eigendecomposition for the non-reweighted (blue) and reweighted (red) Markov transition matrices. ($c$) The samples are shown in the dihedral angle space for the non-reweighted (blue label) and reweighted (red label) diffusion map with colors representing the first and second diffusion-map coordinates $\lambda_0\psi_0(\mathbf{x})$ and $\lambda_1\psi_1(\mathbf{x})$, respectively. The color bar represents the constructed diffusion coordinates.
  • Figure 4: Reweighted stochastic embeddings calculated for chignolin in the TIP3P solvent at 340 K simulated using the CHARMM27 force field. Low-dimensional manifolds are colored according to their free energy. ($a$) Representative conformations from the metastable states estimated by the reweighted embedding methods are shown around the mrse embedding. ($b$) The embedding obtained using stke. Well-tempered metadynamics is used to generate the training set consisting of sines and cosines of all $\Phi$ and $\Psi$ dihedral angles, amounting to 32 variables in total. The training set is generated by performing a 1-$\mu$s simulation with a bias factor $\gamma=20$, enhancing the fluctuations of the distance $d$ between the C$\alpha$ atoms of residues Y1 and Y10 and the radius of gyration $r_g$. ($c$) The free-energy surface calculated along for $d$ and $r_g$. The axes and units for the embeddings are arbitrary and thus not shown. See SI (Sec. S1 C) for computational details.