Table of Contents
Fetching ...

Generative modeling of conditional probability distributions on the level-sets of collective variables

Fatima-Zahrae Akhyar, Wei Zhang, Gabriel Stoltz, Christof Schütte

Abstract

Given a probability distribution $μ$ in $\mathbb{R}^d$ represented by data, we study in this paper the generative modeling of the corresponding conditional probability distributions on the level-sets of a collective variable $ξ:\mathbb{R}^d \rightarrow \mathbb{R}^k$, where $1 \le k<d$. We propose a general and efficient learning approach that can learn generative models on different level-sets of $ξ$ simultaneously. To improve the learning quality on level-sets in low-probability regions, we also propose a data enrichment strategy by utilizing data from enhanced sampling techniques. We demonstrate the effectiveness of our proposed learning approach through concrete numerical examples. The proposed approach is potentially useful for the generative modeling of molecular systems in biophysics.

Generative modeling of conditional probability distributions on the level-sets of collective variables

Abstract

Given a probability distribution in represented by data, we study in this paper the generative modeling of the corresponding conditional probability distributions on the level-sets of a collective variable , where . We propose a general and efficient learning approach that can learn generative models on different level-sets of simultaneously. To improve the learning quality on level-sets in low-probability regions, we also propose a data enrichment strategy by utilizing data from enhanced sampling techniques. We demonstrate the effectiveness of our proposed learning approach through concrete numerical examples. The proposed approach is potentially useful for the generative modeling of molecular systems in biophysics.

Paper Structure

This paper contains 17 sections, 25 equations, 16 figures.

Figures (16)

  • Figure 1: (a) Scatter plot of the 2D dataset sampled using the make_circles function from scikit-learnscikit-learn. (b) Density estimate of the corresponding CV map $\xi(x) = x_1^2 + x_2^2$.
  • Figure 2: Results for the 2D dataset. (a) Mean value of $\xi$ on the generated samples compared to the intended CV value. (b) Mean deviation of $\xi$ on generated samples from the target CV value. (c) Proportion of samples with positive $x_1$ values in the original and in the generated samples for different target CV values.
  • Figure 3: Evolution of the two-dimensional samples under the ODE flow for a fixed target CV value $z=0.6$. The panels show the particle positions at different integration times $t$, illustrating how the initial Gaussian cloud progressively morphs into the target distribution.
  • Figure 4: Heat map and contour representation of the Müller--Brown potential landscape. The two major low-potential regions and the shallow low-potential region are shown in dark blue and light blue, respectively, while the contour lines highlight the specific potential energy levels.
  • Figure 5: (a) Trajectory samples from unbiased sampling overlaid on level-sets of the learned CV map $\xi$. The color bar indicates the corresponding CV values across the configuration space. (b) Scatter plot of the potential energy versus the CV value.
  • ...and 11 more figures