Table of Contents
Fetching ...

Kinetically Consistent Coarse Graining using Kernel-based Extended Dynamic Mode Decomposition

Vahid Nateghi, Feliks Nüske

TL;DR

The work tackles coarse-graining of reversible stochastic dynamics by learning a kinetically faithful diffusion in a reduced CG space using kernel-based gEDMD, enabling preservation of slow transition timescales $t_i=1/\lambda_i$ and metastable structure. A diffusion learning step identifies an effective diffusion $a^{\xi}(z)$ via a linear combination of random Fourier features, while a spectral assessment via the reduced generator $\hat{\mathbf{L}}^{\alpha}_r$ validates kinetic fidelity against the full model. The framework is complemented by force matching to obtain the effective potential $F^{\xi}$ and is demonstrated on a 2D Lemon-Slice model and MD data for alanine dipeptide and Chignolin, showing recovery of meta-stable states and dominant timescales with kinetic and thermodynamic consistency. The results indicate that state-dependent diffusion in CG space is crucial for faithful kinetics, and the approach provides a practical, data-driven route to scalable CG models that retain essential dynamical properties. The method paves the way for more transferable, higher-dimensional CG mappings and potential extensions to underdamped dynamics or memory effects.

Abstract

In this paper, we show how kernel-based models for the Koopman generator -- the gEDMD method -- can be used to identify coarse-grained dynamics on reduced variables, which retain the slowest transition timescales of the original dynamics. The centerpiece of this study is a learning method to identify an effective diffusion in coarse-grained space, which is similar in spirit to the force matching method. By leveraging the gEDMD model for the Koopman generator, the kinetic accuracy of the CG model can be evaluated. By combining this method with a suitable learning method for the effective free energy, such as force matching, a complete model for the effective dynamics can be inferred. Using a two-dimensional model system and molecular dynamics simulation data of alanine dipeptide and the Chignolin mini-protein, we demonstrate that the proposed method successfully and robustly recovers the essential kinetic and also thermodynamic properties of the full model. The parameters of the method can be determined using standard model validation techniques.

Kinetically Consistent Coarse Graining using Kernel-based Extended Dynamic Mode Decomposition

TL;DR

The work tackles coarse-graining of reversible stochastic dynamics by learning a kinetically faithful diffusion in a reduced CG space using kernel-based gEDMD, enabling preservation of slow transition timescales and metastable structure. A diffusion learning step identifies an effective diffusion via a linear combination of random Fourier features, while a spectral assessment via the reduced generator validates kinetic fidelity against the full model. The framework is complemented by force matching to obtain the effective potential and is demonstrated on a 2D Lemon-Slice model and MD data for alanine dipeptide and Chignolin, showing recovery of meta-stable states and dominant timescales with kinetic and thermodynamic consistency. The results indicate that state-dependent diffusion in CG space is crucial for faithful kinetics, and the approach provides a practical, data-driven route to scalable CG models that retain essential dynamical properties. The method paves the way for more transferable, higher-dimensional CG mappings and potential extensions to underdamped dynamics or memory effects.

Abstract

In this paper, we show how kernel-based models for the Koopman generator -- the gEDMD method -- can be used to identify coarse-grained dynamics on reduced variables, which retain the slowest transition timescales of the original dynamics. The centerpiece of this study is a learning method to identify an effective diffusion in coarse-grained space, which is similar in spirit to the force matching method. By leveraging the gEDMD model for the Koopman generator, the kinetic accuracy of the CG model can be evaluated. By combining this method with a suitable learning method for the effective free energy, such as force matching, a complete model for the effective dynamics can be inferred. Using a two-dimensional model system and molecular dynamics simulation data of alanine dipeptide and the Chignolin mini-protein, we demonstrate that the proposed method successfully and robustly recovers the essential kinetic and also thermodynamic properties of the full model. The parameters of the method can be determined using standard model validation techniques.
Paper Structure (30 sections, 39 equations, 16 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 39 equations, 16 figures, 2 tables, 1 algorithm.

Figures (16)

  • Figure 1: Approximation of generator for the Lemon slice system. Potential field in (a). Membership analysis in (b) using $1000$ samples. The dominant eigenvalues of the reference generator $\hat{\mathbf{L}}_r$ and the learned generator $\hat{\mathbf{L}}_\alpha^\xi$ built upon the learned effective diffusion, using Gaussian and periodic Gaussian kernels, in (c). The relative error of these eigenvalues compared to the reference is shown in (d).
  • Figure 2: Application of Algorithm \ref{['alg:cap']} to identify angular dynamics for the Lemon-slice system. Effective force in (a), effective diffusion in (b), effective drift in (c), and integration of an example trajectory, using both the reference and learned SDE in (d).
  • Figure 3: Dominant eigenvalues of the generator, using models built on simulation data of the learned coarse-grained dynamics with state-dependent diffusion (SDD, orange) and with constant diffusion (CD, green). As a comparison, we show the eigenvalues of the generator $\hat{\mathbf{L}}_r$ using the original dataset (blue). Note that the first eigenvalue is omitted as it is zero.
  • Figure 4: Graphical representation of the alanine dipeptide molecule on the left, and the reference free energy profile in two-dimensional dihedral angle space on the right.
  • Figure 5: Approximation of generator for alanine dipeptide. The dominant timescales corresponding to the reference generator $\hat{\mathbf{L}}_r$ and the learned generator $\hat{\mathbf{L}}_\alpha^\xi$ built upon the learned effective diffusion on the left, and the relative error of these timescales compared to the reference is shown on the right.
  • ...and 11 more figures