Table of Contents
Fetching ...

Generative modeling of protein ensembles guided by crystallographic electron densities

Sai Advaith Maddipatla, Nadav Bojan Sellam, Sanketh Vedula, Ailie Marx, Alex Bronstein

TL;DR

This work tackles the challenge of reconstructing dynamic protein ensembles from crystallographic electron density by formulating an inverse problem that aligns multiple conformations with observed density. It introduces a density-guided sampling framework that uses a pre-trained diffusion model (Chroma) as a prior and a differentiable forward model of electron density, employing non-i.i.d. score guidance across the entire ensemble and matching-pursuit filtering to prevent overfitting to noise. The authors demonstrate that their approach recovers multi-modal altloc conformations and yields improved density alignment compared with unconditional sampling, particularly in regions with bimodal density. This methodology enables more accurate, data-driven modeling of protein dynamics and opens avenues for applying similar strategies to other experimental modalities and larger systems.

Abstract

Proteins are dynamic, adopting ensembles of conformations. The nature of this conformational heterogenity is imprinted in the raw electron density measurements obtained from X-ray crystallography experiments. Fitting an ensemble of protein structures to these measurements is a challenging, ill-posed inverse problem. We propose a non-i.i.d. ensemble guidance approach to solve this problem using existing protein structure generative models and demonstrate that it accurately recovers complicated multi-modal alternate protein backbone conformations observed in certain single crystal measurements.

Generative modeling of protein ensembles guided by crystallographic electron densities

TL;DR

This work tackles the challenge of reconstructing dynamic protein ensembles from crystallographic electron density by formulating an inverse problem that aligns multiple conformations with observed density. It introduces a density-guided sampling framework that uses a pre-trained diffusion model (Chroma) as a prior and a differentiable forward model of electron density, employing non-i.i.d. score guidance across the entire ensemble and matching-pursuit filtering to prevent overfitting to noise. The authors demonstrate that their approach recovers multi-modal altloc conformations and yields improved density alignment compared with unconditional sampling, particularly in regions with bimodal density. This methodology enables more accurate, data-driven modeling of protein dynamics and opens avenues for applying similar strategies to other experimental modalities and larger systems.

Abstract

Proteins are dynamic, adopting ensembles of conformations. The nature of this conformational heterogenity is imprinted in the raw electron density measurements obtained from X-ray crystallography experiments. Fitting an ensemble of protein structures to these measurements is a challenging, ill-posed inverse problem. We propose a non-i.i.d. ensemble guidance approach to solve this problem using existing protein structure generative models and demonstrate that it accurately recovers complicated multi-modal alternate protein backbone conformations observed in certain single crystal measurements.

Paper Structure

This paper contains 22 sections, 5 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: The proposed density-guided protein ensemble generation method. The diffusion model is used to sample a non-i.i.d. ensemble from which the likelihood of the observed electron density is calculated and used to guide the sampling.
  • Figure 2: Density-guided Chroma fits the density better than unguided Chroma, and recovers two known alternative locations accurately. Full protein structure is displayed as white cartoons. The sampled ensembles in the region of interest are depicted as sticks and overlaid on the experimental density $1 \sigma$-isosurface. The inserts show the agreement of $F_\mathrm{c}$ (red) to the observed density $F_\mathrm{o}$ (blue) visualized as $1 \sigma$-isomeshes. Cosine similarities $F_\mathrm{c}$ and $F_\mathrm{o}$ are reported below each panel. Density guidance produces consistently better density alignment and correctly captures the multi-modal nature of the observed density.
  • Figure 3: Density-guided Chroma accurately captures the bimodal distribution of the backbone conformations, while unconditional sampling consistently fails to represent it. Negative scores represent proximity to the modeled altloc A, while positive scores correspond to altloc B.
  • Figure A1: Comparison of conditional and unconditional sampling in bimodally- and unimodally-distributed regions of protein 7EC8:A. The figure illustrates the differences between unconditional sampling (first row) and density-guided (second row) sampling methods in three regions of the protein (left-to-right): residues $143-146$ of chain A exhibiting an explicitly modeled dual conformation (two altlocs), the same position in chain B originally modeled as a high B-factors single conformation, and residues $205-208$ in chain B, originally modeled as a low B-factors single conformation. Our density-guided sampling consistently describes the flexible region in both chains as a bimodal distribution while producing a tightly distributed ensemble for the third low B-factor region.
  • Figure A2: Comparison of density fitting for residues 205-208 of protein 7EC8. The unguided approach (left) does not adequately align the calculated density with the observed density, whereas the guided approach (right) demonstrates significantly better alignment. The observed density is represented by the light blue surface, while the calculated density from the samples is shown as a dark red isomesh.
  • ...and 1 more figures