Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time
Daniel D. Richman, Jessica Karaguesian, Carl-Mikael Suomivuori, Ron O. Dror
TL;DR
Biomolecular conformational landscapes are challenging to sample with static or single-structure diffusion models. ConforMix introduces inference-time conditioning via Twisted Diffusion Sampling and RMSD-guided bias (ConforMixRMSD), plus MBAR-based reweighting for free-energy estimation, to uncover hidden conformations without retraining. The method is demonstrated on Boltz-1 and BioEmu, recovering domain motions, transporter cycles, cryptic pockets, and RNA transitions while providing faster, richer energy landscape insights. These results show that inference-time enhanced sampling can dramatically improve our ability to characterize thermodynamic ensembles and transition pathways in complex biomolecular systems, with broad applicability to proteins, complexes, and nucleic acids.
Abstract
The function of biomolecules such as proteins depends on their ability to interconvert between a wide range of structures or "conformations." Researchers have endeavored for decades to develop computational methods to predict the distribution of conformations, which is far harder to determine experimentally than a static folded structure. We present ConforMix, an inference-time algorithm that enhances sampling of conformational distributions using a combination of classifier guidance, filtering, and free energy estimation. Our approach upgrades diffusion models -- whether trained for static structure prediction or conformational generation -- to enable more efficient discovery of conformational variability without requiring prior knowledge of major degrees of freedom. ConforMix is orthogonal to improvements in model pretraining and would benefit even a hypothetical model that perfectly reproduced the Boltzmann distribution. Remarkably, when applied to a diffusion model trained for static structure prediction, ConforMix captures structural changes including domain motion, cryptic pocket flexibility, and transporter cycling, while avoiding unphysical states. Case studies of biologically critical proteins demonstrate the scalability, accuracy, and utility of this method.
