On finding optimal collective variables for complex systems by minimizing the deviation between effective and full dynamics
Wei Zhang, Christof Schütte
TL;DR
The paper tackles the problem of finding optimal low-dimensional collective variables for complex, high-dimensional Markov dynamics by introducing an effective dynamics framework and an objective based on relative entropy. It proves that, for a fixed CV map, the corresponding lower-dimensional process is the KL-minimizing surrogate to the full transition density, and it characterizes how the choice of CV affects timescales and transition rates through variational principles. The work provides explicit error bounds connecting the spectra of the full transfer operator to those of the effective operator and shows that when eigenfunctions (or committors) factor through the CV, the corresponding timescales and rates are preserved. It also demonstrates meaningful links to data-driven methods (e.g., VAMPnets, MSMs, normalizing flows) and explains how these approaches implicitly learn quantities of the effective dynamics, guiding the design of new CV-learning algorithms for molecular kinetics and other complex systems. Overall, the results establish a rigorous framework for CV selection and effective-model construction with implications for large-time simulations and the development of improved, theory-informed data-driven methods.
Abstract
This paper is concerned with collective variables, or reaction coordinates, that map a discrete-in-time Markov process $X_n$ in $\mathbb{R}^d$ to a (much) smaller dimension $k \ll d$. We define the effective dynamics under a given collective variable map $ξ$ as the best Markovian representation of $X_n$ under $ξ$. The novelty of the paper is that it gives strict criteria for selecting optimal collective variables via the properties of the effective dynamics. In particular, we show that the transition density of the effective dynamics of the optimal collective variable solves a relative entropy minimization problem from certain family of densities to the transition density of $X_n$. We also show that many transfer operator-based data-driven numerical approaches essentially learn quantities of the effective dynamics. Furthermore, we obtain various error estimates for the effective dynamics in approximating dominant timescales / eigenvalues and transition rates of the original process $X_n$ and how optimal collective variables minimize these errors. Our results contribute to the development of theoretical tools for the understanding of complex dynamical systems, e.g. molecular kinetics, on large timescales. These results shed light on the relations among existing data-driven numerical approaches for identifying good collective variables, and they also motivate the development of new methods.
