Table of Contents
Fetching ...

RecCrysFormer: Refined Protein Structural Prediction from 3D Patterson Maps via Recycling Training Runs

Tom Pan, Evan Dramko, Mitchell D. Miller, George N. Phillips, Anastasios Kyrillidis

TL;DR

This work tackles the crystallographic phase problem by predicting electron-density maps directly from Patterson maps using RecCrysFormer, a hybrid 3D CNN–vision transformer. The model integrates Patterson-derived data with standardized partial-structure densities through a Transformer core employing a one-way attention scheme, and it leverages a recycling training loop that reuses refinement outputs as template inputs to progressively improve accuracy. Empirical results on synthetic 15-residue fragments show substantial gains in Pearson correlation $PC(\mathbf{e}, \mathbf{e}')$ and reductions in mean phase error after recycling, with additional improvements on a variable-resolution dataset. The approach demonstrates a promising ML-assisted pathway toward crystallographic structure determination and highlights clear directions for scaling to larger systems and diverse space groups.

Abstract

Determining protein structures at an atomic level remains a significant challenge in structural biology. We introduce $\texttt{RecCrysFormer}$, a hybrid model that exploits the strengths of transformers with the aim of integrating experimental and ML approaches to protein structure determination from crystallographic data. $\texttt{RecCrysFormer}$ leverages Patterson maps and incorporates known standardized partial structures of amino acid residues to directly predict electron density maps, which are essential for constructing detailed atomic models through crystallographic refinement processes. $\texttt{RecCrysFormer}$ benefits from a ``recycling'' training regimen that iteratively incorporates results from crystallographic refinements and previous training runs as additional inputs in the form of template maps. Using a preliminary dataset of synthetic peptide fragments based on Protein Data Bank, $\texttt{RecCrysFormer}$ achieves good accuracy in structural predictions and shows robustness against variations in crystal parameters, such as unit cell dimensions and angles.

RecCrysFormer: Refined Protein Structural Prediction from 3D Patterson Maps via Recycling Training Runs

TL;DR

This work tackles the crystallographic phase problem by predicting electron-density maps directly from Patterson maps using RecCrysFormer, a hybrid 3D CNN–vision transformer. The model integrates Patterson-derived data with standardized partial-structure densities through a Transformer core employing a one-way attention scheme, and it leverages a recycling training loop that reuses refinement outputs as template inputs to progressively improve accuracy. Empirical results on synthetic 15-residue fragments show substantial gains in Pearson correlation and reductions in mean phase error after recycling, with additional improvements on a variable-resolution dataset. The approach demonstrates a promising ML-assisted pathway toward crystallographic structure determination and highlights clear directions for scaling to larger systems and diverse space groups.

Abstract

Determining protein structures at an atomic level remains a significant challenge in structural biology. We introduce , a hybrid model that exploits the strengths of transformers with the aim of integrating experimental and ML approaches to protein structure determination from crystallographic data. leverages Patterson maps and incorporates known standardized partial structures of amino acid residues to directly predict electron density maps, which are essential for constructing detailed atomic models through crystallographic refinement processes. benefits from a ``recycling'' training regimen that iteratively incorporates results from crystallographic refinements and previous training runs as additional inputs in the form of template maps. Using a preliminary dataset of synthetic peptide fragments based on Protein Data Bank, achieves good accuracy in structural predictions and shows robustness against variations in crystal parameters, such as unit cell dimensions and angles.

Paper Structure

This paper contains 14 sections, 12 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: A representation of the process for determining crystal structures. Complete structure factors are obtained from diffraction patterns through various methods. By applying a Fourier transform to these, the electron density within the unit cell is calculated. The initial model is then iteratively refined through comparison with experimental measurements.
  • Figure 2: Math representation of the preprocessing steps for Patterson maps and partial structures.
  • Figure 3: Overview of our transformer layer
  • Figure 4: RecCrysFormer meta-algorithm. Arrows show the information flow among the various components.
  • Figure 5: A test set example (4AZ3_1.pd_11) representing the failure case where SHELXE could not produce a refined map. The underlying ground truth model is shown in red. Our first recycling formulation only slightly improves most aspects, but the prediction after our modified run shows clear improvement in several details. See the highlighted box for a region that demonstrates this.
  • ...and 3 more figures