Learning conformational ensembles of proteins based on backbone geometry
Nicolas Wolf, Leif Seute, Vsevolod Viliuga, Simon Wagner, Jan Stühmer, Frauke Gräter
TL;DR
This work tackles efficient sampling of protein conformational ensembles from Boltzmann distributions without relying on evolutionary information. It introduces BBFlow, a conditional SE(3) flow-matching model that encodes the equilibrium backbone geometry and uses a geodesic-based conditional prior to generate samples from p(x|x_eq). BBFlow achieves orders-of-magnitude faster inference than state-of-the-art MD emulators, generalizes to multi-chain proteins, and performs well on both natural and de novo proteins while being trainable from scratch in a few GPU days. Limitations include dependence on an initial structure, restricted ability to capture rare, long-timescale events, and incomplete modeling of sidechains, but the approach offers a practical, scalable path for MD-accurate dynamics in design pipelines and large-scale screenings.
Abstract
Deep generative models have recently been proposed for sampling protein conformations from the Boltzmann distribution, as an alternative to often prohibitively expensive Molecular Dynamics simulations. However, current state-of-the-art approaches rely on fine-tuning pre-trained folding models and evolutionary sequence information, limiting their applicability and efficiency, and introducing potential biases. In this work, we propose a flow matching model for sampling protein conformations based solely on backbone geometry - BBFlow. We introduce a geometric encoding of the backbone equilibrium structure as input and propose to condition not only the flow but also the prior distribution on the respective equilibrium structure, eliminating the need for evolutionary information. The resulting model is orders of magnitudes faster than current state-of-the-art approaches at comparable accuracy, is transferable to multi-chain proteins, and can be trained from scratch in a few GPU days. In our experiments, we demonstrate that the proposed model achieves competitive performance with reduced inference time, across not only an established benchmark of naturally occurring proteins but also de novo proteins, for which evolutionary information is scarce or absent. BBFlow is available at https://github.com/graeter-group/bbflow.
