The fast committor machine: Interpretable prediction with kernels
D. Aristoff, M. Johnson, G. Simpson, R. J. Webber
TL;DR
The paper addresses estimating the forward committor $q^*(oldsymbol{x})= obreakoldsymbol{P}_{oldsymbol{x}}(T_B<T_A)$ from trajectory data in metastable Markov systems. It introduces the fast committor machine (FCM), a kernel-based, interpretable estimator that uses an adaptive linear map $oldsymbol{M}$ learned via the Recursive Feature Machine and coefficients $oldsymbol{ heta}$ learned via randomly pivoted Cholesky, achieving linear-in-$N$ training cost and revealing low-dimensional active subspaces. Key contributions include (i) extending RFM to the committor with a new kernel form that enforces boundary conditions and emphasizes $A o B$ transitions; (ii) demonstrating superior accuracy and faster training than a neural network with the same parameter budget on triple-wwell and alanine dipeptide; (iii) showing that the learned $oldsymbol{M}^{1/2}$ identifies a small active subspace (often two-dimensional) that dominates committor gradients. This approach enables scalable, interpretable committor estimation in high-dimensional molecular systems and motivates adaptive sampling strategies to further improve data efficiency.
Abstract
In the study of stochastic systems, the committor function describes the probability that a system starting from an initial configuration $x$ will reach a set $B$ before a set $A$. This paper introduces an efficient and interpretable algorithm for approximating the committor, called the "fast committor machine" (FCM). The FCM uses simulated trajectory data to build a kernel-based model of the committor. The kernel function is constructed to emphasize low-dimensional subspaces that optimally describe the $A$ to $B$ transitions. The coefficients in the kernel model are determined using randomized linear algebra, leading to a runtime that scales linearly in the number of data points. In numerical experiments involving a triple-well potential and alanine dipeptide, the FCM yields higher accuracy and trains more quickly than a neural network with the same number of parameters. The FCM is also more interpretable than the neural net.
