Scalable Normalizing Flows Enable Boltzmann Generators for Macromolecules
Joseph C. Kim, David Bloore, Karan Kapoor, Jun Feng, Ming-Hong Hao, Mengdi Wang
TL;DR
The paper addresses the challenge of scalable Boltzmann sampling for macromolecules by introducing a split-channel normalizing-flow architecture that operates in reduced internal coordinates and employs gated-attention coupling layers. A multi-stage training regimen blends maximum-likelihood and energy-based objectives, with a backbone-focused 2-Wasserstein loss on distance matrices to enforce global structural fidelity while preserving local details. Evaluations on HP35 and Protein G demonstrate improved backbone geometry, low-energy generated conformations, and the ability to discover novel metastable states not present in training, outperforming traditional NSF baselines. These advances enable more efficient and physically grounded sampling of protein conformations, with potential impact on drug design and understanding of functional states, while highlighting avenues for transferability and further methodology enhancements.
Abstract
The Boltzmann distribution of a protein provides a roadmap to all of its functional states. Normalizing flows are a promising tool for modeling this distribution, but current methods are intractable for typical pharmacological targets; they become computationally intractable due to the size of the system, heterogeneity of intra-molecular potential energy, and long-range interactions. To remedy these issues, we present a novel flow architecture that utilizes split channels and gated attention to efficiently learn the conformational distribution of proteins defined by internal coordinates. We show that by utilizing a 2-Wasserstein loss, one can smooth the transition from maximum likelihood training to energy-based training, enabling the training of Boltzmann Generators for macromolecules. We evaluate our model and training strategy on villin headpiece HP35(nle-nle), a 35-residue subdomain, and protein G, a 56-residue protein. We demonstrate that standard architectures and training strategies, such as maximum likelihood alone, fail while our novel architecture and multi-stage training strategy are able to model the conformational distributions of protein G and HP35.
