Table of Contents
Fetching ...

Solving Inverse Problems in Protein Space Using Diffusion-Based Priors

Axel Levy, Eric R. Chan, Sara Fridovich-Keil, Frédéric Poitevin, Ellen D. Zhong, Gordon Wetzstein

TL;DR

This work introduces a versatile framework to turn biophysical measurements, such as cryo-EM density maps, into 3D atomic models and is the first diffusion-based method for refining atomic models from cryo-EM maps and building atomic models from sparse distance matrices.

Abstract

The interaction of a protein with its environment can be understood and controlled via its 3D structure. Experimental methods for protein structure determination, such as X-ray crystallography or cryogenic electron microscopy, shed light on biological processes but introduce challenging inverse problems. Learning-based approaches have emerged as accurate and efficient methods to solve these inverse problems for 3D structure determination, but are specialized for a predefined type of measurement. Here, we introduce a versatile framework to turn biophysical measurements, such as cryo-EM density maps, into 3D atomic models. Our method combines a physics-based forward model of the measurement process with a pretrained generative model providing a task-agnostic, data-driven prior. Our method outperforms posterior sampling baselines on linear and non-linear inverse problems. In particular, it is the first diffusion-based method for refining atomic models from cryo-EM maps and building atomic models from sparse distance matrices.

Solving Inverse Problems in Protein Space Using Diffusion-Based Priors

TL;DR

This work introduces a versatile framework to turn biophysical measurements, such as cryo-EM density maps, into 3D atomic models and is the first diffusion-based method for refining atomic models from cryo-EM maps and building atomic models from sparse distance matrices.

Abstract

The interaction of a protein with its environment can be understood and controlled via its 3D structure. Experimental methods for protein structure determination, such as X-ray crystallography or cryogenic electron microscopy, shed light on biological processes but introduce challenging inverse problems. Learning-based approaches have emerged as accurate and efficient methods to solve these inverse problems for 3D structure determination, but are specialized for a predefined type of measurement. Here, we introduce a versatile framework to turn biophysical measurements, such as cryo-EM density maps, into 3D atomic models. Our method combines a physics-based forward model of the measurement process with a pretrained generative model providing a task-agnostic, data-driven prior. Our method outperforms posterior sampling baselines on linear and non-linear inverse problems. In particular, it is the first diffusion-based method for refining atomic models from cryo-EM maps and building atomic models from sparse distance matrices.
Paper Structure (21 sections, 10 equations, 6 figures, 3 tables)

This paper contains 21 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure S1: Condition number of $\mathbf{R}$ vs. number of residues $N$.
  • Figure S2: Gradient Descent for Linear Constraint. Convergence speed of different optimization techniques on a linear inverse problem (structure completion with a subsampling factor of 2), without the diffusion prior. Using preconditioning with momentum leads to the fastest convergence. The "loss" corresponds to the sum of squared distances between unmasked atom coordinates.
  • Figure S3: Comparison to DPS. We compare ADP-3D to DPS chung2022diffusion for the structure completion task on PDB:8ok3, with a subsampling factor of 4. (a) Distribution of final RMSDs over 64 replicas. For DPS, we perform a sweep over the parameter $\zeta^\prime$, controlling the magnitude of the gradient step. (b) Comparison of runtime for the same experiment.
  • Figure S4: Ablation Study. For atomic model refinement, we combine three sources of conditioning information (incomplete model, density map and sequence) with the data-driven prior of the diffusion model. Here we highlight the importance of each conditioning information, and that of the generative prior. (a) Qualitative reconstructions with the target structure in transparency. (b) RMSD of alpha carbons vs. completeness. We use the same structure as in Fig. 4 (PDB:7pzt), and a cryo-EM map at 2.0 $\text{\AA}$ resolution.
  • Figure S5: Atomic Model Refinement. Results on the TecA bacterial toxin (PDB:7pzt, 160 residues). (a) Qualitative results. From left to right: the input density map at 2.0 $\text{\AA}$ resolution, the incomplete model given by ModelAngelo and our refined models (1 output and 5 outputs), overlaid on the target structure in transparency. (b) RMSD of alpha carbons vs. completeness (number of predicted residues / total number of residues) with ModelAngelo (MA) and our method. We run 5 experiments and report the mean of the lowest RMSD on $\alpha$-carbons over 8 replicas ($\pm1$ std). The spread of RMSD is further described in the supplements. The experimental (deposited) resolution is indicated with a dashed line.
  • ...and 1 more figures