Table of Contents
Fetching ...

Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time

Daniel D. Richman, Jessica Karaguesian, Carl-Mikael Suomivuori, Ron O. Dror

TL;DR

Biomolecular conformational landscapes are challenging to sample with static or single-structure diffusion models. ConforMix introduces inference-time conditioning via Twisted Diffusion Sampling and RMSD-guided bias (ConforMixRMSD), plus MBAR-based reweighting for free-energy estimation, to uncover hidden conformations without retraining. The method is demonstrated on Boltz-1 and BioEmu, recovering domain motions, transporter cycles, cryptic pockets, and RNA transitions while providing faster, richer energy landscape insights. These results show that inference-time enhanced sampling can dramatically improve our ability to characterize thermodynamic ensembles and transition pathways in complex biomolecular systems, with broad applicability to proteins, complexes, and nucleic acids.

Abstract

The function of biomolecules such as proteins depends on their ability to interconvert between a wide range of structures or "conformations." Researchers have endeavored for decades to develop computational methods to predict the distribution of conformations, which is far harder to determine experimentally than a static folded structure. We present ConforMix, an inference-time algorithm that enhances sampling of conformational distributions using a combination of classifier guidance, filtering, and free energy estimation. Our approach upgrades diffusion models -- whether trained for static structure prediction or conformational generation -- to enable more efficient discovery of conformational variability without requiring prior knowledge of major degrees of freedom. ConforMix is orthogonal to improvements in model pretraining and would benefit even a hypothetical model that perfectly reproduced the Boltzmann distribution. Remarkably, when applied to a diffusion model trained for static structure prediction, ConforMix captures structural changes including domain motion, cryptic pocket flexibility, and transporter cycling, while avoiding unphysical states. Case studies of biologically critical proteins demonstrate the scalability, accuracy, and utility of this method.

Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time

TL;DR

Biomolecular conformational landscapes are challenging to sample with static or single-structure diffusion models. ConforMix introduces inference-time conditioning via Twisted Diffusion Sampling and RMSD-guided bias (ConforMixRMSD), plus MBAR-based reweighting for free-energy estimation, to uncover hidden conformations without retraining. The method is demonstrated on Boltz-1 and BioEmu, recovering domain motions, transporter cycles, cryptic pockets, and RNA transitions while providing faster, richer energy landscape insights. These results show that inference-time enhanced sampling can dramatically improve our ability to characterize thermodynamic ensembles and transition pathways in complex biomolecular systems, with broad applicability to proteins, complexes, and nucleic acids.

Abstract

The function of biomolecules such as proteins depends on their ability to interconvert between a wide range of structures or "conformations." Researchers have endeavored for decades to develop computational methods to predict the distribution of conformations, which is far harder to determine experimentally than a static folded structure. We present ConforMix, an inference-time algorithm that enhances sampling of conformational distributions using a combination of classifier guidance, filtering, and free energy estimation. Our approach upgrades diffusion models -- whether trained for static structure prediction or conformational generation -- to enable more efficient discovery of conformational variability without requiring prior knowledge of major degrees of freedom. ConforMix is orthogonal to improvements in model pretraining and would benefit even a hypothetical model that perfectly reproduced the Boltzmann distribution. Remarkably, when applied to a diffusion model trained for static structure prediction, ConforMix captures structural changes including domain motion, cryptic pocket flexibility, and transporter cycling, while avoiding unphysical states. Case studies of biologically critical proteins demonstrate the scalability, accuracy, and utility of this method.

Paper Structure

This paper contains 23 sections, 6 equations, 23 figures, 4 tables, 2 algorithms.

Figures (23)

  • Figure 1: (A) ConforMix adds conditioning to diffusion-based structure prediction models, enabling deeper and more efficient exploration. (B) ConforMix uses a series of bias potentials to target new states. ConforMixRMSD is an instantiation that biases sampling away from default predictions to sample conformational transitions without requiring any user intervention. Sample reweighting can be applied to recover the ground state distribution.
  • Figure 2: ConforMixRMSD uncovers conformational states that are not sampled by default. Conformational sampling of (A) a domain motion protein and (B) a membrane transporter. For each system, Top Left: density of sampling relative to reference experimental structures. Bottom Left: projection of the ConforMixRMSD sampled structures (orange) onto the first two principal components computed from their internal atomic distances. Experimentaly determined reference structures (grey) are projected onto the same space. Right: reference structures and the closest structure generated by each sampling approach (lowest RMSD).
  • Figure 3: ConforMixRMSD samples domain motion, revealing transitions consistent with experiment while avoiding noisy paths. (A) Principal component analysis of pairwise $C\alpha$ distances of samples generated for a domain motion protein, dppA. Default sampling extends toward but does not reach the open conformation, while ConforMixRMSD traces an opening/closing path. While AFCluster generates open and closed states, its sampling of many other large fluctuations makes it harder to identify the relevant motion. (B) Analysis of variance of samples generated by default sampling, ConforMixRMSD, and AFCluster, all used with Boltz. Each point describes results for one protein. Default sampling and ConforMixRMSD tend to exhibit concentrated variance between samples that align with the direction of domain motion between reference structures. Note that this metric captures alignment of the sampling direction with experimental conformational differences, but not the extent of sampling along that direction--—a method may align well without sampling both experimental conformations. AFCluster often produces samples whose structural differences do not match the direction of domain motion between references, or lack dominant directions of variance---indicating off-path sampling. Black-outlined points denote dppA.
  • Figure 4: Exploration of biological macromolecules of interest. (A) ConforMixRMSD-Boltz recovers all three experimentally determined conformations of the SemiSWEET transporter. Default Boltz sampling recovers only the occluded state. PCA demonstrates how the samples capture the major motions as the transporter opens to the inward (intracellular) or outward (extracellular) sides. (B) Preliminary application of ConforMixRMSD-Boltz to RNA structure shows it can recapitulate MD-observed transitions.
  • Figure 5: ConforMix sampling in BioEmu enables faster convergence of free energy estimates than default sampling. Using a series of guidance potentials based on RMSD to the native state enables systematic collection of samples that, in turn, enable more rapid free energy estimation. The free energy difference estimated is between the lowest free energy RMSD value (3Å to the reference state) and a more extended conformation at 7.5Å. 90% confidence interval from 50 bootstraps is shown.
  • ...and 18 more figures