Completion of partial structures using Patterson maps with the CrysFormer machine learning model
Tom Pan, Evan Dramko, Mitchell D. Miller, Anastasios Kyrillidis, George N. Phillips
TL;DR
The paper tackles the crystallographic phase problem by tightly coupling X-ray diffraction data with deep learning, introducing CrysFormer, a hybrid 3D vision transformer–CNN that predicts electron density maps from Patterson maps while leveraging partial AlphaFold-derived structure templates. A large synthetic dataset of protein fragments embedded in crystal-like unit cells is used to train the model, demonstrating improvements in phase accuracy and the completion of missing template regions relative to a post-sigma baseline. Key findings show that CrysFormer yields higher Pearson correlations between predicted and ground-truth densities and reduces phase errors, including in challenging cases with poorly aligned AFDB templates, illustrating the practical potential of integrating experimental data with ML-based predictions. This work advances protein structure determination by enabling more accurate density-based refinements that fuse experimental measurements with deep learning templates, potentially accelerating structure elucidation in cases with incomplete data.
Abstract
Protein structure determination has long been one of the primary challenges of structural biology, to which deep machine learning (ML)-based approaches have increasingly been applied. However, these ML models generally do not incorporate the experimental measurements directly, such as X-ray crystallographic diffraction data. To this end, we explore an approach that more tightly couples these traditional crystallographic and recent ML-based methods, by training a hybrid 3-d vision transformer and convolutional network on inputs from both domains. We make use of two distinct input constructs / Patterson maps, which are directly obtainable from crystallographic data, and ``partial structure'' template maps derived from predicted structures deposited in the AlphaFold Protein Structure Database with subsequently omitted residues. With these, we predict electron density maps that are then post-processed into atomic models through standard crystallographic refinement processes. Introducing an initial dataset of small protein fragments taken from Protein Data Bank entries and placing them in hypothetical crystal settings, we demonstrate that our method is effective at both improving the phases of the crystallographic structure factors and completing the regions missing from partial structure templates, as well as improving the agreement of the electron density maps with the ground truth atomic structures.
