Table of Contents
Fetching ...

Doob's Lagrangian: A Sample-Efficient Variational Approach to Transition Path Sampling

Yuanqi Du, Michael Plainer, Rob Brekelmans, Chenru Duan, Frank Noé, Carla P. Gomes, Alán Aspuru-Guzik, Kirill Neklyudov

TL;DR

Doob's Lagrangian reframes transition path sampling as a variational problem over conditioned path distributions. It introduces a simulation-free training objective and a boundary-enforcing Gaussian path parameterization that yields the Doob transform without costly trajectory simulations. The method extends to first- and second-order dynamics and to mixtures of Gaussian paths, enabling efficient sampling of transition ensembles in complex molecular systems. Across synthetic benchmarks and protein/peptide tasks, the approach achieves comparable accuracy to baselines with substantially fewer energy evaluations, highlighting its practical impact for sample-efficient TPS in chemistry and materials science.

Abstract

Rare event sampling in dynamical systems is a fundamental problem arising in the natural sciences, which poses significant computational challenges due to an exponentially large space of trajectories. For settings where the dynamical system of interest follows a Brownian motion with known drift, the question of conditioning the process to reach a given endpoint or desired rare event is definitively answered by Doob's h-transform. However, the naive estimation of this transform is infeasible, as it requires simulating sufficiently many forward trajectories to estimate rare event probabilities. In this work, we propose a variational formulation of Doob's h-transform as an optimization problem over trajectories between a given initial point and the desired ending point. To solve this optimization, we propose a simulation-free training objective with a model parameterization that imposes the desired boundary conditions by design. Our approach significantly reduces the search space over trajectories and avoids expensive trajectory simulation and inefficient importance sampling estimators which are required in existing methods. We demonstrate the ability of our method to find feasible transition paths on real-world molecular simulation and protein folding tasks.

Doob's Lagrangian: A Sample-Efficient Variational Approach to Transition Path Sampling

TL;DR

Doob's Lagrangian reframes transition path sampling as a variational problem over conditioned path distributions. It introduces a simulation-free training objective and a boundary-enforcing Gaussian path parameterization that yields the Doob transform without costly trajectory simulations. The method extends to first- and second-order dynamics and to mixtures of Gaussian paths, enabling efficient sampling of transition ensembles in complex molecular systems. Across synthetic benchmarks and protein/peptide tasks, the approach achieves comparable accuracy to baselines with substantially fewer energy evaluations, highlighting its practical impact for sample-efficient TPS in chemistry and materials science.

Abstract

Rare event sampling in dynamical systems is a fundamental problem arising in the natural sciences, which poses significant computational challenges due to an exponentially large space of trajectories. For settings where the dynamical system of interest follows a Brownian motion with known drift, the question of conditioning the process to reach a given endpoint or desired rare event is definitively answered by Doob's h-transform. However, the naive estimation of this transform is infeasible, as it requires simulating sufficiently many forward trajectories to estimate rare event probabilities. In this work, we propose a variational formulation of Doob's h-transform as an optimization problem over trajectories between a given initial point and the desired ending point. To solve this optimization, we propose a simulation-free training objective with a model parameterization that imposes the desired boundary conditions by design. Our approach significantly reduces the search space over trajectories and avoids expensive trajectory simulation and inefficient importance sampling estimators which are required in existing methods. We demonstrate the ability of our method to find feasible transition paths on real-world molecular simulation and protein folding tasks.

Paper Structure

This paper contains 55 sections, 14 theorems, 66 equations, 12 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

[jamison1975markov] Let $h_{{\mathcal{B}}}(x,t) \coloneqq \rho_T(x_T \in {\mathcal{B}}\,|\, x_t= x)$ denote the conditional transition probability of the reference process in eq:ref_sde. Then, Note that all of our subsequent results hold for the case when $\mathcal{B}$ is a point-mass, with the only change being that the $h$-function becomes a density, $h_{B}(x,t) = \rho_T(B\,|\, x_t = x)$.

Figures (12)

  • Figure 1: Given reference dynamics, transition path sampling seeks to capture the conditional or posterior distribution over paths which reach a terminal set $x_T \in \mathcal{B}$. However, simulating the reference dynamics (blue) can be wasteful since we rarely obtain paths (orange) which reach (the vicinity of) the terminal set $\mathcal{B}$. This is a major challenge for techniques based on importance sampling or Monte Carlo estimation, even when adding a control term to the reference dynamics. By contrast, our approach optimizes a tractable variational distribution over transition paths with a parameterization which satisfies the initial and terminal conditions by design.
  • Figure 2: Comparing path histograms and trajectories of TPS using fixed-length two-way shooting and comparing it with our variational approach.
  • Figure 3: Illustration of the expressivity of unimodal Gaussian versus mixture of Gaussian paths on a symmetric potential with two transition path modes.
  • Figure 4: Transition path for the protein Chignolin. The energy plot a transition path in which the protein folds in $T=1,000$ fs, and passes a high energy barrier at $460fs$ with about $3,000$ kJ/mol.
  • Figure 5: In \ref{['subfig:mueller-path-density']}, we compare the log likelihood of sampled trajectories, where a higher likelihood is generally more favorable. The plot in \ref{['subfig:mueller-max-energy']} shows the maximum energy of each individual trajectory. A high maximum energy means that the molecule needs to be in an excited state during the transition, making it less likely to occur under lower temperatures.
  • ...and 7 more figures

Theorems & Definitions (21)

  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Proposition 3
  • Proposition 4
  • Proposition 4
  • proof
  • Proposition 4
  • ...and 11 more