Table of Contents
Fetching ...

Nonparametric Reaction Coordinate Optimization with Histories: A Framework for Rare Event Dynamics

Polina V. Banushkina, Sergei V. Krivov

TL;DR

It is demonstrated that rare event dynamics can be accurately characterized without extensive sampling of the configuration space, establishing a general, flexible, and robust framework for analyzing complex dynamical systems and longitudinal datasets.

Abstract

Rare but critical events in complex systems, such as protein folding, chemical reactions, disease progression, and extreme weather or climate phenomena, are governed by complex, high-dimensional, stochastic dynamics. Identifying an optimal reaction coordinate (RC) that accurately captures the progress of these dynamics is crucial for understanding and simulating such processes. However, determining an optimal RC for realistic systems is notoriously difficult, due to methodological challenges that limit the success of standard machine learning techniques. These challenges include the absence of ground truth, the lack of a loss function for general nonequilibrium dynamics, the difficulty of selecting expressive neural network architectures that avoid overfitting, the irregular and incomplete nature of many real world trajectories, limited sampling and the extreme data imbalance inherent in rare event problems. Here, we introduce a nonparametric RC optimization framework that incorporates trajectory histories and circumvents these challenges, enabling robust analysis of irregular or incomplete data without requiring extensive sampling. The power of the method is demonstrated through increasingly challenging analyses of protein folding dynamics, where it yields accurate committor estimates that pass stringent validation tests and produce high resolution free energy profiles. Its generality is further illustrated through applications to phase space dynamics, a conceptual ocean circulation model, and a longitudinal clinical dataset. These results demonstrate that rare event dynamics can be accurately characterized without extensive sampling of the configuration space, establishing a general, flexible, and robust framework for analyzing complex dynamical systems and longitudinal datasets.

Nonparametric Reaction Coordinate Optimization with Histories: A Framework for Rare Event Dynamics

TL;DR

It is demonstrated that rare event dynamics can be accurately characterized without extensive sampling of the configuration space, establishing a general, flexible, and robust framework for analyzing complex dynamical systems and longitudinal datasets.

Abstract

Rare but critical events in complex systems, such as protein folding, chemical reactions, disease progression, and extreme weather or climate phenomena, are governed by complex, high-dimensional, stochastic dynamics. Identifying an optimal reaction coordinate (RC) that accurately captures the progress of these dynamics is crucial for understanding and simulating such processes. However, determining an optimal RC for realistic systems is notoriously difficult, due to methodological challenges that limit the success of standard machine learning techniques. These challenges include the absence of ground truth, the lack of a loss function for general nonequilibrium dynamics, the difficulty of selecting expressive neural network architectures that avoid overfitting, the irregular and incomplete nature of many real world trajectories, limited sampling and the extreme data imbalance inherent in rare event problems. Here, we introduce a nonparametric RC optimization framework that incorporates trajectory histories and circumvents these challenges, enabling robust analysis of irregular or incomplete data without requiring extensive sampling. The power of the method is demonstrated through increasingly challenging analyses of protein folding dynamics, where it yields accurate committor estimates that pass stringent validation tests and produce high resolution free energy profiles. Its generality is further illustrated through applications to phase space dynamics, a conceptual ocean circulation model, and a longitudinal clinical dataset. These results demonstrate that rare event dynamics can be accurately characterized without extensive sampling of the configuration space, establishing a general, flexible, and robust framework for analyzing complex dynamical systems and longitudinal datasets.

Paper Structure

This paper contains 25 sections, 5 equations, 7 figures.

Figures (7)

  • Figure 1: RC optimization with histories using a complete set of CVs. Free energy as a function of the committor $q$ (a) and the natural committor $\tilde{q}$, where $D(\tilde{q})=\text{const}$ (b). (c) Validation criterion $Z_q(q, k \Delta t_0)$ remains approximately constant across different lag times (time-scales) $k=1,2,...,2^{16}$, indicating that the putative optimal RC closely approximates the committor. (d) Predicted vs. observed committors for the full trajectory, as well as for its first (1/2) and second (2/2) halves, show good agreement; the halves are used to estimate statistical fluctuations.
  • Figure 2: RC optimization using an incomplete set of CVs. Results are in very good agreement with those shown in Fig. \ref{['fig:1']}. Notation as in Fig. \ref{['fig:1']}.
  • Figure 3: RC optimization for highly irregular ensemble of trajectories, mimicking a clinical dataset. Results are in good agreement with that shown on Figs. \ref{['fig:1']} and \ref{['fig:2']}. (a) Free energy as a function of the committor; (b) Validation criterion $Z_q$ is relatively constant up to statistical fluctuations; curves are shifted to have zero mean for clarity. (c) Predicted vs. observed committors disagree, as most of trajectories do not reach boundary states. (d) Histogram of trajectories reaching state $A$ ($n_A$), state $B$ ($n_B$), or neither ($n$ discarded).
  • Figure 4: The MFPT RC optimization using a single variable - the RMSD time-series: Free energy as a function of the MFPT $\tau$ (a) and the natural MFPT $\tilde{\tau}$, where $D(\tilde{\tau})=\text{const}$ (b). Note that native basin lies on the left, as the native state corresponds to $\tau=0$. (c) Validation criterion $Z_\tau$ is relatively constant ($\pm 4\%$), indicating that the putative optimal RC closely approximates the MFPT. (d) Predicted and observed MFPT are in good agreement. Free energy profiles along the MFPT (e) and natural MFPT (f), determined, using a complete set of CVs, for comparison.
  • Figure 5: RCs as functions of phase space for underdamped Langevin dynamics: (a) Validation criterion ($Z_q$) remains approximately constant ($\pm 0.6\%$), up to statistical fluctuations. (b) Predicted and observed committors agree. (c) Committor as a function of phase space. (d) Backward committor $q_-$ as a function of phase space.
  • ...and 2 more figures