Table of Contents
Fetching ...

Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge

Ziyang Yu, Wenbing Huang, Yang Liu

TL;DR

The Pretrained Variational Bridge is presented in an encoder-decoder fashion, which maps the initial structure into a noised latent space and transports it toward stage-specific targets through augmented bridge matching, enabling consistent use of cross-domain structural knowledge across training stages.

Abstract

Molecular Dynamics (MD) simulations provide a fundamental tool for characterizing molecular behavior at full atomic resolution, but their applicability is severely constrained by the computational cost. To address this, a surge of deep generative models has recently emerged to learn dynamics at coarsened timesteps for efficient trajectory generation, yet they either generalize poorly across systems or, due to limited molecular diversity of trajectory data, fail to fully exploit structural information to improve generative fidelity. Here, we present the Pretrained Variational Bridge (PVB) in an encoder-decoder fashion, which maps the initial structure into a noised latent space and transports it toward stage-specific targets through augmented bridge matching. This unifies training on both single-structure and paired trajectory data, enabling consistent use of cross-domain structural knowledge across training stages. Moreover, for protein-ligand complexes, we further introduce a reinforcement learning-based optimization via adjoint matching that speeds progression toward the holo state, which supports efficient post-optimization of docking poses. Experiments on proteins and protein-ligand complexes demonstrate that PVB faithfully reproduces thermodynamic and kinetic observables from MD while delivering stable and efficient generative dynamics.

Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge

TL;DR

The Pretrained Variational Bridge is presented in an encoder-decoder fashion, which maps the initial structure into a noised latent space and transports it toward stage-specific targets through augmented bridge matching, enabling consistent use of cross-domain structural knowledge across training stages.

Abstract

Molecular Dynamics (MD) simulations provide a fundamental tool for characterizing molecular behavior at full atomic resolution, but their applicability is severely constrained by the computational cost. To address this, a surge of deep generative models has recently emerged to learn dynamics at coarsened timesteps for efficient trajectory generation, yet they either generalize poorly across systems or, due to limited molecular diversity of trajectory data, fail to fully exploit structural information to improve generative fidelity. Here, we present the Pretrained Variational Bridge (PVB) in an encoder-decoder fashion, which maps the initial structure into a noised latent space and transports it toward stage-specific targets through augmented bridge matching. This unifies training on both single-structure and paired trajectory data, enabling consistent use of cross-domain structural knowledge across training stages. Moreover, for protein-ligand complexes, we further introduce a reinforcement learning-based optimization via adjoint matching that speeds progression toward the holo state, which supports efficient post-optimization of docking poses. Experiments on proteins and protein-ligand complexes demonstrate that PVB faithfully reproduces thermodynamic and kinetic observables from MD while delivering stable and efficient generative dynamics.
Paper Structure (47 sections, 2 theorems, 27 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 47 sections, 2 theorems, 27 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

Denote by $p_e^*$ the minimizer of eq:loss_kl, ${\mathbb{P}}^*$ the path measure associated with eq:abm_sde, and $p_d^*$ the probability density function of the conditional measure ${\mathbb{P}}_{1|0}^*$. Then the following equality holds:

Figures (7)

  • Figure 1: Two variants of PVB. Blue arrows show coarse-grained sampling that traverses unfolded and folded states to reproduce the Boltzmann distribution, while purple arrows denote accelerated transition that drives rapid access from the initial state to the target folded state via reinforcement learning.
  • Figure 2: Schematic of the overall PVB workflow. a. The unified framework for pretraining on single-structure data and finetuning on paired trajectory data. The encoder $\varphi_e$ maps the initial state ${\mathbf{X}}_0$ to the latent variable ${\mathbf{Y}}_0$, which the decoder $\varphi_d$ then propagates to the stage-specific target ${\mathbf{Y}}_1$ via \ref{['eq:abm_sde']}. b. RL finetuning with frozen $\varphi_e$ and $\varphi_d^u$ initialized from the finetuned $\varphi_d$, aiming to accelerate exploration of protein-ligand holo states. Given the predicted next state ${\mathbf{Y}}_1$ generated by $\varphi_e$ and $\varphi_d^u$, the lean adjoint state$\Tilde{a}_1$ is first computed from the reward function (i.e., the root mean square error to the holo state ${\mathbf{X}}_{\text{ref}}$), and then propagated backward to $\Tilde{a}_t$ by solving \ref{['eq:lean_adjoint_state']} with diffusion time $t$, which is subsequently used to update $\varphi_d^u$ according to \ref{['eq:loss_adj']}.
  • Figure 3: Illustration of generated trajectories for PDB 2bjq (row 1), PDB 7rm7 (row 2), CATH domain 3er0A02 (row 3), and CATH domain 1pyaA00 (row 4). Left: Representative structures from the first 10 frames of the generated trajectories. Middle: Free energy surfaces projected onto TIC0 and TIC1, respectively. Right: Probability differences between PVB and MD across the 10 metastable states estimated by MSM.
  • Figure 4: Illustration of ligand RMSD and CoM along the trajectories for PDB 2ww0 (top row) and PDB 5n2f (bottom row). Left: Visualization of the complex, with the receptor shown in blue and the ligand in green. Middle: Ligand RMSD over time. Right: Pocket-ligand CoM distance over time.
  • Figure 5: Comparison of the predicted holo state with the co-crystal structure for a. PDB 6d15 and b. PDB 6j0g. The holo protein structure is shown in blue, and the displayed ligand pose is presented after aligning the predicted protein to the holo structure, with the ligand transformed accordingly.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2