Multiscale guidance of protein structure prediction with heterogeneous cryo-EM data
Rishwanth Raghu, Axel Levy, Gordon Wetzstein, Ellen D. Zhong
TL;DR
CryoBoltz presents an inference-time framework that guides a pretrained diffusion-based protein structure predictor using heterogeneous cryo-EM data to sample conformational ensembles reflecting experimental evidence. It combines global shape constraints with a physics-informed local forward-model term in a multiscale guidance pipeline, enabling atomic models to be built into heterogeneous density maps without retraining. Across synthetic and real datasets, CryoBoltz increases sampling diversity, improves fit in flexible regions such as antibody CDR loops, and outperforms existing diffusion baselines and model-building methods. The approach offers a practical path to exploring biomolecular conformational landscapes while leveraging large-scale structural priors.
Abstract
Protein structure prediction models are now capable of generating accurate 3D structural hypotheses from sequence alone. However, they routinely fail to capture the conformational diversity of dynamic biomolecular complexes, often requiring heuristic MSA subsampling approaches for generating alternative states. In parallel, cryo-electron microscopy (cryo-EM) has emerged as a powerful tool for imaging near-native structural heterogeneity, but is challenged by arduous pipelines to transform raw experimental data into atomic models. Here, we bridge the gap between these modalities, combining cryo-EM density maps with the rich sequence and biophysical priors learned by protein structure prediction models. Our method, CryoBoltz, guides the sampling trajectory of a pretrained biomolecular structure prediction model using both global and local structural constraints derived from density maps, driving predictions towards conformational states consistent with the experimental data. We demonstrate that this flexible yet powerful inference-time approach allows us to build atomic models into heterogeneous cryo-EM maps across a variety of dynamic biomolecular systems including transporters and antibodies. Code is available at https://github.com/ml-struct-bio/cryoboltz .
