Table of Contents
Fetching ...

Dihedral Angle Adherence: Evaluating Protein Structure Predictions in the Absence of Experimental Data

Musa Azeem, Homayoun Valafar

TL;DR

The paper addresses the challenge of evaluating protein structure predictions without experimental ground-truth data. It introduces a dihedral-adherence metric, computed per residue, by mining context-specific $\phi$ and $\psi$ angle distributions from the Protein Data Bank via windowed comparisons, clustering, and Mahalanobis distance to predicted angles. The method shows a significant correlation with RMSD across CASP-14 predictions ($R^2 = 0.755$, $p<0.01$), while also revealing residue-level locations where predictions underperform, enabling targeted improvements. This reference-free evaluation framework offers practical value for guiding development and refinement of protein structure predictions, including AlphaFold outputs. It thus provides a scalable, insight-rich alternative to RMSD for monitoring and enhancing predictive accuracy in the absence of ground-truth structures.

Abstract

Determining the 3D structures of proteins is essential in understanding their behavior in the cellular environment. Computational methods of predicting protein structures have advanced, but assessing prediction accuracy remains a challenge. The traditional method, RMSD, relies on experimentally determined structures and lacks insight into improvement areas of predictions. We propose an alternative: analyzing dihedral angles, bypassing the need for the reference structure of an evaluated protein. Our method segments proteins into amino acid subsequences and searches for matches, comparing dihedral angles across numerous proteins to compute a metric using Mahalanobis distance. Evaluated on many predictions, our approach correlates with RMSD and identifies areas for prediction enhancement. This method offers a promising route for accurate protein structure prediction assessment and improvement.

Dihedral Angle Adherence: Evaluating Protein Structure Predictions in the Absence of Experimental Data

TL;DR

The paper addresses the challenge of evaluating protein structure predictions without experimental ground-truth data. It introduces a dihedral-adherence metric, computed per residue, by mining context-specific and angle distributions from the Protein Data Bank via windowed comparisons, clustering, and Mahalanobis distance to predicted angles. The method shows a significant correlation with RMSD across CASP-14 predictions (, ), while also revealing residue-level locations where predictions underperform, enabling targeted improvements. This reference-free evaluation framework offers practical value for guiding development and refinement of protein structure predictions, including AlphaFold outputs. It thus provides a scalable, insight-rich alternative to RMSD for monitoring and enhancing predictive accuracy in the absence of ground-truth structures.

Abstract

Determining the 3D structures of proteins is essential in understanding their behavior in the cellular environment. Computational methods of predicting protein structures have advanced, but assessing prediction accuracy remains a challenge. The traditional method, RMSD, relies on experimentally determined structures and lacks insight into improvement areas of predictions. We propose an alternative: analyzing dihedral angles, bypassing the need for the reference structure of an evaluated protein. Our method segments proteins into amino acid subsequences and searches for matches, comparing dihedral angles across numerous proteins to compute a metric using Mahalanobis distance. Evaluated on many predictions, our approach correlates with RMSD and identifies areas for prediction enhancement. This method offers a promising route for accurate protein structure prediction assessment and improvement.
Paper Structure (15 sections, 7 figures)

This paper contains 15 sections, 7 figures.

Figures (7)

  • Figure 1: Overview of our methodology. Amino acid sequences and numeric figures are shown as examples
  • Figure 2: Visualization of the location of dihedral angles, $\phi$ and $\psi$. Example shows the amino acid subsequence of residues E, F, and W. Here, a window size of 3 is illustrated for conciseness.
  • Figure 3: KDE plot of the $(\phi,\psi)$ distribution queried from PDBMine for the subsequence LAGLTG. It is clear that certain values of $\phi$ and $\psi$ are highly probable given this subsequence, while others are near zero.
  • Figure 4: The dihedral distribution chosen as most probable, $D^{(397)}$, for residue S in the protein 7W6B is shown in the KDE plot. Overlaid are $(\phi, \psi)$ data points of interest. The dihedral angles for the X-ray-determined structure at this residue are shown in orange. The same for the AlphaFold prediction, the prediction T1091TS360_1, and all other predictions submitted to CASP-14 are shown in purple, green, and black, respectively. The mean value of the distribution is shown in red. Dashed lines from each point of interest illustrate the process of calculating the Mahalanobis distance metric. We can see the X-ray-determined structure and AlphaFold's predictions have relatively small distances, while T1091TS360_1 and some other predictions are very far.
  • Figure 5: Our calculated dihedral adherence for each residue of each prediction for the protein 6VR4. Variations in certain columns indicate key residues where many predictions disagree.
  • ...and 2 more figures