Table of Contents
Fetching ...

The fast committor machine: Interpretable prediction with kernels

D. Aristoff, M. Johnson, G. Simpson, R. J. Webber

TL;DR

The paper addresses estimating the forward committor $q^*(oldsymbol{x})= obreakoldsymbol{P}_{oldsymbol{x}}(T_B<T_A)$ from trajectory data in metastable Markov systems. It introduces the fast committor machine (FCM), a kernel-based, interpretable estimator that uses an adaptive linear map $oldsymbol{M}$ learned via the Recursive Feature Machine and coefficients $oldsymbol{ heta}$ learned via randomly pivoted Cholesky, achieving linear-in-$N$ training cost and revealing low-dimensional active subspaces. Key contributions include (i) extending RFM to the committor with a new kernel form that enforces boundary conditions and emphasizes $A o B$ transitions; (ii) demonstrating superior accuracy and faster training than a neural network with the same parameter budget on triple-wwell and alanine dipeptide; (iii) showing that the learned $oldsymbol{M}^{1/2}$ identifies a small active subspace (often two-dimensional) that dominates committor gradients. This approach enables scalable, interpretable committor estimation in high-dimensional molecular systems and motivates adaptive sampling strategies to further improve data efficiency.

Abstract

In the study of stochastic systems, the committor function describes the probability that a system starting from an initial configuration $x$ will reach a set $B$ before a set $A$. This paper introduces an efficient and interpretable algorithm for approximating the committor, called the "fast committor machine" (FCM). The FCM uses simulated trajectory data to build a kernel-based model of the committor. The kernel function is constructed to emphasize low-dimensional subspaces that optimally describe the $A$ to $B$ transitions. The coefficients in the kernel model are determined using randomized linear algebra, leading to a runtime that scales linearly in the number of data points. In numerical experiments involving a triple-well potential and alanine dipeptide, the FCM yields higher accuracy and trains more quickly than a neural network with the same number of parameters. The FCM is also more interpretable than the neural net.

The fast committor machine: Interpretable prediction with kernels

TL;DR

The paper addresses estimating the forward committor from trajectory data in metastable Markov systems. It introduces the fast committor machine (FCM), a kernel-based, interpretable estimator that uses an adaptive linear map learned via the Recursive Feature Machine and coefficients learned via randomly pivoted Cholesky, achieving linear-in- training cost and revealing low-dimensional active subspaces. Key contributions include (i) extending RFM to the committor with a new kernel form that enforces boundary conditions and emphasizes transitions; (ii) demonstrating superior accuracy and faster training than a neural network with the same parameter budget on triple-wwell and alanine dipeptide; (iii) showing that the learned identifies a small active subspace (often two-dimensional) that dominates committor gradients. This approach enables scalable, interpretable committor estimation in high-dimensional molecular systems and motivates adaptive sampling strategies to further improve data efficiency.

Abstract

In the study of stochastic systems, the committor function describes the probability that a system starting from an initial configuration will reach a set before a set . This paper introduces an efficient and interpretable algorithm for approximating the committor, called the "fast committor machine" (FCM). The FCM uses simulated trajectory data to build a kernel-based model of the committor. The kernel function is constructed to emphasize low-dimensional subspaces that optimally describe the to transitions. The coefficients in the kernel model are determined using randomized linear algebra, leading to a runtime that scales linearly in the number of data points. In numerical experiments involving a triple-well potential and alanine dipeptide, the FCM yields higher accuracy and trains more quickly than a neural network with the same number of parameters. The FCM is also more interpretable than the neural net.
Paper Structure (17 sections, 2 theorems, 39 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 17 sections, 2 theorems, 39 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Theorem A.1

Define the least-squares loss function where the positive semidefinite kernel matrix consists of four blocks with entries Then, the loss function $\bm{L}_{\gamma}(\bm{c}, \bm{d})$ has a minimizer that satisfies $\bm{c} + \bm{d} = \bm{0}$.

Figures (7)

  • Figure 1: (a) The potential function $V_0$. (b) The reference committor evaluated on the validation data points with states $A$ and $B$ and the committor one-half surface indicated in red.
  • Figure 2: Comparison of neural net (NN) and FCM performance for the triple-well system, with standard error bars computed from $10$ independent runs of the FCM. (a) Mean squared error computed using the reference committor. (b) Runtime in seconds.
  • Figure 3: Training performance for single instances of the triple-well experiment, with error computed using the reference committor from Fig. \ref{['fig:potential']}. (a) The FCM with different approximation ranks $r$. (b) The neural net with different learning rates $lr$.
  • Figure 4: Square root of scaling matrix for the triple-well system when $N = 10^6$ and $r = 1000$. (a) After 1 iteration. (b) After 5 iterations, corresponding to convergence.
  • Figure 5: (a) Free energy surface of alanine dipeptide in $\phi$ and $\psi$ coordinates, compared to reference committor half-surface (red). (b) The reference committor in $\phi$ and $\psi$ coordinates with states $A$ and $B$ and the committor one-half surface indicated in red.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem A.1
  • proof
  • Proposition B.1
  • proof