Table of Contents
Fetching ...

From Biased to Unbiased Dynamics: An Infinitesimal Generator Approach

Timothée Devergne, Vladimir Kostic, Michele Parrinello, Massimiliano Pontil

TL;DR

The work addresses extracting spectral properties of Langevin-type dynamics when only biased simulations are affordable, by learning the infinitesimal generator through its resolvent. It develops a debiasing framework that leverages the Radon-Nikodym relationship between biased and unbiased measures and optimizes a regularized energy kernel to recover leading generator eigenpairs, with a ridge regression estimator $G=(W+\gamma I)^{-1}C$ guiding the computation. A neural-network extension learns expressive dictionaries $z^\theta$ to capture slow modes, supported by a theoretical guarantee that, under boundedness and sufficient approximation capacity, the leading eigenpairs converge with high probability. Empirical evaluations on one- and two-dimensional benchmarks and a small biomolecule suite demonstrate superior performance over transfer-operator methods and competitive results with recent generator-learning approaches, even when biasing yields only a few transitions. The method promises practical impact for uncovering transition mechanisms and timescales in complex molecular systems, and it invites extensions to time-dependent bias and large-scale applications.

Abstract

We investigate learning the eigenfunctions of evolution operators for time-reversal invariant stochastic processes, a prime example being the Langevin equation used in molecular dynamics. Many physical or chemical processes described by this equation involve transitions between metastable states separated by high potential barriers that can hardly be crossed during a simulation. To overcome this bottleneck, data are collected via biased simulations that explore the state space more rapidly. We propose a framework for learning from biased simulations rooted in the infinitesimal generator of the process and the associated resolvent operator. We contrast our approach to more common ones based on the transfer operator, showing that it can provably learn the spectral properties of the unbiased system from biased data. In experiments, we highlight the advantages of our method over transfer operator approaches and recent developments based on generator learning, demonstrating its effectiveness in estimating eigenfunctions and eigenvalues. Importantly, we show that even with datasets containing only a few relevant transitions due to sub-optimal biasing, our approach recovers relevant information about the transition mechanism.

From Biased to Unbiased Dynamics: An Infinitesimal Generator Approach

TL;DR

The work addresses extracting spectral properties of Langevin-type dynamics when only biased simulations are affordable, by learning the infinitesimal generator through its resolvent. It develops a debiasing framework that leverages the Radon-Nikodym relationship between biased and unbiased measures and optimizes a regularized energy kernel to recover leading generator eigenpairs, with a ridge regression estimator guiding the computation. A neural-network extension learns expressive dictionaries to capture slow modes, supported by a theoretical guarantee that, under boundedness and sufficient approximation capacity, the leading eigenpairs converge with high probability. Empirical evaluations on one- and two-dimensional benchmarks and a small biomolecule suite demonstrate superior performance over transfer-operator methods and competitive results with recent generator-learning approaches, even when biasing yields only a few transitions. The method promises practical impact for uncovering transition mechanisms and timescales in complex molecular systems, and it invites extensions to time-dependent bias and large-scale applications.

Abstract

We investigate learning the eigenfunctions of evolution operators for time-reversal invariant stochastic processes, a prime example being the Langevin equation used in molecular dynamics. Many physical or chemical processes described by this equation involve transitions between metastable states separated by high potential barriers that can hardly be crossed during a simulation. To overcome this bottleneck, data are collected via biased simulations that explore the state space more rapidly. We propose a framework for learning from biased simulations rooted in the infinitesimal generator of the process and the associated resolvent operator. We contrast our approach to more common ones based on the transfer operator, showing that it can provably learn the spectral properties of the unbiased system from biased data. In experiments, we highlight the advantages of our method over transfer operator approaches and recent developments based on generator learning, demonstrating its effectiveness in estimating eigenfunctions and eigenvalues. Importantly, we show that even with datasets containing only a few relevant transitions due to sub-optimal biasing, our approach recovers relevant information about the transition mechanism.
Paper Structure (20 sections, 8 theorems, 64 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 8 theorems, 64 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $\mathcal{D}_n=(x_i')_{i\in[n]}$ be the biased dataset generated from $\pi'$. Let $w(x)=e^{\beta V(x)}$ and define the empirical covariances w.r.t. the empirical distribution $\widehat{\pi}'\,{=}\,n^{{-}1}\sum_{i{\in}[n]}\delta_{x_i'}$ by Compute the eigenpairs $(\nu_i,v_i)_{i\in[m]}$ of the RR estimator $\widehat{\textsc{G}}_{\eta,\gamma}\,{=}\,(\widehat{\textsc{W}}\,{+}\,\eta\gamma\textsc{I

Figures (8)

  • Figure 1: Pipeline of our method: from biased simulations to timescales and metastable states.
  • Figure 2: Muller Brown potential. Comparison of the ground truth two first relevant eigenfunctions of the potential (first column) with this work (second column), transfer operator approach deepTICA deepTICA (third column) and the work of Zhang et al. Zhang2022 (fourth column). x and y axis are the coordinates of the system and points are colored according to the value of the eigenfunction. The underlying potential is represented by the level lines in white. Associated eigenvalues $\lambda$ are also reported.
  • Figure 3: Alanine Dipeptide. Results of our method trained on Dataset 1 a) and b) first and second eigenfunctions represented on dataset 1, in the plane of the $\phi$ and $\psi$ dihedral angles. c) first eigenfunction represented on dataset 2, in the plane of the $\phi$ and $\theta$ dihedral angles, indicating that our method is effective even when trained from poor CVs (see text for more discussion). On all three panels, points are colored according to the value of the eigenfunction. d) Comparison of our method with the committor ($q$) of peilin
  • Figure 4: Our method for the chignolin miniprotein. The data points are represented in the plane of the distance between the nitrogen atom of the residue 3: ASP (ASP3N) and the oxygen atom of the residue 7: Gly (Gly7O) and the distance between ASP3N and the oxygen atom of residue 8: THR (THR8) which allow visualizing the folded and unfolded states.
  • Figure 5: Typical behavior of the loss function during a training.
  • ...and 3 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • Proposition 1: DK1970
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Theorem 2
  • proof
  • Theorem 2
  • proof