Split-Flows: Measure Transport and Information Loss Across Molecular Resolutions
Sander Hummerich, Tristan Bereau, Ullrich Köthe
TL;DR
Coarse-grained molecular models accelerate simulations but discard microscopic detail, quantified by a mapping entropy over the fiber of configurations. The authors introduce split-flows, a continuous-time normalizing-flow framework that augments coarse-grained states with noise and learns a bijective map to fine-grained configurations, enabling backmapping samples from $\pi_{r|R}$ and tractable estimation of the configuration-dependent mapping entropy via density evolution with $\log |\det J|$ or its divergence form. The method is demonstrated on chignolin, a lipid bilayer, and alanine dipeptide, combining two-sided flow matching with fiber sampling to deliver accurate reconstructions and a principled assessment of information loss across resolutions. This approach provides a versatile tool for evaluating and improving coarse-grained models and offers pathways to scale to larger biomolecules with autoregressive or related strategies, while connecting to thermodynamic quantities through the PMF decomposition $W(R)=E(R)-TS(R)$.
Abstract
By reducing resolution, coarse-grained models greatly accelerate molecular simulations, unlocking access to long-timescale phenomena, though at the expense of microscopic information. Recovering this fine-grained detail is essential for tasks that depend on atomistic accuracy, making backmapping a central challenge in molecular modeling. We introduce split-flows, a novel flow-based approach that reinterprets backmapping as a continuous-time measure transport across resolutions. Unlike existing generative strategies, split-flows establish a direct probabilistic link between resolutions, enabling expressive conditional sampling of atomistic structures and -- for the first time -- a tractable route to computing mapping entropies, an information-theoretic measure of the irreducible detail lost in coarse-graining. We demonstrate these capabilities on diverse molecular systems, including chignolin, a lipid bilayer, and alanine dipeptide, highlighting split-flows as a principled framework for accurate backmapping and systematic evaluation of coarse-grained models.
