Table of Contents
Fetching ...

Split-Flows: Measure Transport and Information Loss Across Molecular Resolutions

Sander Hummerich, Tristan Bereau, Ullrich Köthe

TL;DR

Coarse-grained molecular models accelerate simulations but discard microscopic detail, quantified by a mapping entropy over the fiber of configurations. The authors introduce split-flows, a continuous-time normalizing-flow framework that augments coarse-grained states with noise and learns a bijective map to fine-grained configurations, enabling backmapping samples from $\pi_{r|R}$ and tractable estimation of the configuration-dependent mapping entropy via density evolution with $\log |\det J|$ or its divergence form. The method is demonstrated on chignolin, a lipid bilayer, and alanine dipeptide, combining two-sided flow matching with fiber sampling to deliver accurate reconstructions and a principled assessment of information loss across resolutions. This approach provides a versatile tool for evaluating and improving coarse-grained models and offers pathways to scale to larger biomolecules with autoregressive or related strategies, while connecting to thermodynamic quantities through the PMF decomposition $W(R)=E(R)-TS(R)$.

Abstract

By reducing resolution, coarse-grained models greatly accelerate molecular simulations, unlocking access to long-timescale phenomena, though at the expense of microscopic information. Recovering this fine-grained detail is essential for tasks that depend on atomistic accuracy, making backmapping a central challenge in molecular modeling. We introduce split-flows, a novel flow-based approach that reinterprets backmapping as a continuous-time measure transport across resolutions. Unlike existing generative strategies, split-flows establish a direct probabilistic link between resolutions, enabling expressive conditional sampling of atomistic structures and -- for the first time -- a tractable route to computing mapping entropies, an information-theoretic measure of the irreducible detail lost in coarse-graining. We demonstrate these capabilities on diverse molecular systems, including chignolin, a lipid bilayer, and alanine dipeptide, highlighting split-flows as a principled framework for accurate backmapping and systematic evaluation of coarse-grained models.

Split-Flows: Measure Transport and Information Loss Across Molecular Resolutions

TL;DR

Coarse-grained molecular models accelerate simulations but discard microscopic detail, quantified by a mapping entropy over the fiber of configurations. The authors introduce split-flows, a continuous-time normalizing-flow framework that augments coarse-grained states with noise and learns a bijective map to fine-grained configurations, enabling backmapping samples from and tractable estimation of the configuration-dependent mapping entropy via density evolution with or its divergence form. The method is demonstrated on chignolin, a lipid bilayer, and alanine dipeptide, combining two-sided flow matching with fiber sampling to deliver accurate reconstructions and a principled assessment of information loss across resolutions. This approach provides a versatile tool for evaluating and improving coarse-grained models and offers pathways to scale to larger biomolecules with autoregressive or related strategies, while connecting to thermodynamic quantities through the PMF decomposition .

Abstract

By reducing resolution, coarse-grained models greatly accelerate molecular simulations, unlocking access to long-timescale phenomena, though at the expense of microscopic information. Recovering this fine-grained detail is essential for tasks that depend on atomistic accuracy, making backmapping a central challenge in molecular modeling. We introduce split-flows, a novel flow-based approach that reinterprets backmapping as a continuous-time measure transport across resolutions. Unlike existing generative strategies, split-flows establish a direct probabilistic link between resolutions, enabling expressive conditional sampling of atomistic structures and -- for the first time -- a tractable route to computing mapping entropies, an information-theoretic measure of the irreducible detail lost in coarse-graining. We demonstrate these capabilities on diverse molecular systems, including chignolin, a lipid bilayer, and alanine dipeptide, highlighting split-flows as a principled framework for accurate backmapping and systematic evaluation of coarse-grained models.

Paper Structure

This paper contains 34 sections, 4 theorems, 63 equations, 9 figures, 9 tables, 2 algorithms.

Key Result

Proposition A.1

Let ${\bm{r}} \in \mathbb{R}^n$ denote a fine-grained configuration, and let ${\bm{R}} = M({\bm{r}}) \in \mathbb{R}^N$ be the associated coarse-grained representative obtained by a measurable coarse-graining map $M: \mathbb{R}^n \to \mathbb{R}^N$. Suppose the fine-grained configurations are Boltzman where $u({\bm{r}})$ is the potential energy governing the fine-grained distribution and $Z$ is the

Figures (9)

  • Figure 1: (A) Split-flows connect fine- and coarse-grained densities, $\pi_r$ and $\pi_R$, respectively, at different molecular resolutions via a continuous-time measure transport that maps the excess degrees of freedom of the fine-grained resolution to a simple noise distribution, $\pi_{\epsilon \mid R}$. (B) This enables sampling from the conditional density $\pi_{r \mid R}$, i.e., generative backmapping, and quantifies the information loss inherent in the coarse-grained representation.
  • Figure 2: Bottom-up coarse-graining defines a many-to-one mapping operator $M$ that reduces a set $\Omega_R({\bm{R}})$ of fine-grained configurations to a single coarse-grained representative ${\bm{R}}$.
  • Figure 3: Split-flows define a one-to-one map between configurations of different resolutions. The lower-dimensional samples ${\bm{R}}$ are augmented with noise $\boldsymbol{\epsilon}$ to resolve the degeneracy induced by the dimensionality gap. The flow $\phi_t$ connects the joint density $\pi_R \times \pi_{\epsilon \mid R}$ at $t=0$ with the density $\pi_r$ of high-dimensional samples ${\bm{r}}$ at $t=1$.
  • Figure 4: Log densities in the plane of the first two components of TICA. We present the projected log densities of the original simulated configurations as well as backmapped configurations using reference methods and our split-flows. The projection separates the folded (A), unfolded (B), and misfolded (C) modes of chignolin.
  • Figure 5: Average information loss per removed degree of freedom in the $C_\alpha$ representation of chignolin along a MD trajectory. We analyze a short section of the simulation starting in a folded state (A), followed by a partial separation of the two strands (B), and returning to the folded state (C).
  • ...and 4 more figures

Theorems & Definitions (15)

  • Proposition A.1: Decomposition of the coarse-grained potential
  • proof
  • Remark A.1.1
  • Remark A.1.2
  • Proposition A.2: Computation of fiber averages with split-flows
  • proof
  • Remark A.2.1: Practical estimation
  • Proposition A.3: Mapping entropy estimation with split-flows
  • proof
  • Remark A.3.1: Mapping entropy estimation with continuous normalizing flows
  • ...and 5 more