Table of Contents
Fetching ...

Generalized Flow Matching for Transition Dynamics Modeling

Haibo Wang, Yuxuan Qiu, Yanze Wang, Rob Brekelmans, Yuanqi Du

TL;DR

A generalized flow matching framework is formulated that learns a vector field to sample propable paths between the two marginal densities under the learned energy function and iteratively refine the model by assigning importance weights to the sampled paths and buffering more likely paths for training.

Abstract

Simulating transition dynamics between metastable states is a fundamental challenge in dynamical systems and stochastic processes with wide real-world applications in understanding protein folding, chemical reactions and neural activities. However, the computational challenge often lies on sampling exponentially many paths in which only a small fraction ends in the target metastable state due to existence of high energy barriers. To amortize the cost, we propose a data-driven approach to warm-up the simulation by learning nonlinear interpolations from local dynamics. Specifically, we infer a potential energy function from local dynamics data. To find plausible paths between two metastable states, we formulate a generalized flow matching framework that learns a vector field to sample propable paths between the two marginal densities under the learned energy function. Furthermore, we iteratively refine the model by assigning importance weights to the sampled paths and buffering more likely paths for training. We validate the effectiveness of the proposed method to sample probable paths on both synthetic and real-world molecular systems.

Generalized Flow Matching for Transition Dynamics Modeling

TL;DR

A generalized flow matching framework is formulated that learns a vector field to sample propable paths between the two marginal densities under the learned energy function and iteratively refine the model by assigning importance weights to the sampled paths and buffering more likely paths for training.

Abstract

Simulating transition dynamics between metastable states is a fundamental challenge in dynamical systems and stochastic processes with wide real-world applications in understanding protein folding, chemical reactions and neural activities. However, the computational challenge often lies on sampling exponentially many paths in which only a small fraction ends in the target metastable state due to existence of high energy barriers. To amortize the cost, we propose a data-driven approach to warm-up the simulation by learning nonlinear interpolations from local dynamics. Specifically, we infer a potential energy function from local dynamics data. To find plausible paths between two metastable states, we formulate a generalized flow matching framework that learns a vector field to sample propable paths between the two marginal densities under the learned energy function. Furthermore, we iteratively refine the model by assigning importance weights to the sampled paths and buffering more likely paths for training. We validate the effectiveness of the proposed method to sample probable paths on both synthetic and real-world molecular systems.

Paper Structure

This paper contains 26 sections, 2 theorems, 37 equations, 11 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Taking the expectation of the conditional objective over $(x_0, x_T) \sim p_{0,T}$ and enforcing that $p_{0,T}\in\Pi(\mu_0,\mu_T)$ satisfies the boundary conditions yields an upper bound on $\mathcal{L}_{\text{GFM}}$,

Figures (11)

  • Figure 1: Both datasets in the figure contain 2000 pairs of data points, but randomly sampled from simulation of 4K and 12K steps, respectively.
  • Figure 2: Sampled paths from models trained on both the shorter-run and longer-run datasets (Saddle points are stared).
  • Figure 3: Alanine Dipeptide qualitative evaluation. 50 randomly sampled transition paths are shown for both parameterization in Cartesian and internal coordinate systems with two learned potential energies. Each models are trained over 30,000 data sampled uniformly from a 1.2ns simulation on each metastable states.
  • Figure 4: Alanine Dipeptide low-energy path visualization. A total of 500 timesteps from one mestable state to another going through an energy barrier.
  • Figure 5: Linear interpolation paths for the Longer-run dataset with and without OT. Saddle points are labeled. Only 100 paths are selected.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Definition 1
  • Proposition 1
  • Definition 2
  • Lemma 1
  • proof
  • proof