Table of Contents
Fetching ...

Assessment of scoring functions for computational models of protein-protein interfaces

Jacob Sumner, Grace Meng, Naomi Brandt, Alex T. Grigas, Andrés Córdoba, Mark D. Shattuck, Corey S. O'Hern

TL;DR

The paper benchmarks seven PPI scoring functions on uniformly sampled rigid-body re-docked heterodimers, quantifying how scores correlate with DockQ across targets and datasets. It identifies two physical interface features—interface contact count Nc and separability S—that strongly influence scoring difficulty, and demonstrates that a two-feature SVR using these features matches or exceeds existing scoring functions. It shows that sampling in DockQ is crucial for fair evaluation and that scoring performance degrades as monomers move away from bound conformations, highlighting limitations in flexible docking. The authors advocate incorporating physically discriminative features into scoring models to improve PPI predictions and CAPRI-style assessments, and suggest future work on physics-informed learning and integration with GNNs.

Abstract

A goal of computational studies of protein-protein interfaces (PPIs) is to predict the binding site between two monomers that form a heterodimer. The simplest version of this problem is to rigidly re-dock the bound forms of the monomers, which involves generating computational models of the heterodimer and then scoring them to determine the most native-like models. Scoring functions have been assessed previously using rank- and classification-based metrics, however, these methods are sensitive to the number and quality of models in the scoring function training set. We assess the accuracy of seven PPI scoring functions by comparing their scores to a measure of structural similarity to the x-ray crystal structure (i.e. the DockQ score) for a non-redundant set of heterodimers from the Protein Data Bank. For each heterodimer, we generate re-docked models uniformly sampled over DockQ and calculate the Spearman correlation between the PPI scores and DockQ. For some targets, the scores and DockQ are highly correlated; however, for many targets, there are weak correlations. Several physical features can explain the difference between difficult- and easy-to-score targets. For example, strong correlations exist between the score and DockQ for targets with highly intertwined monomers and many interface contacts. We also develop a new score based on only three physical features that matches or exceeds the performance of current PPI scoring functions. These results emphasize that PPI prediction can be improved by focusing on correlations between the PPI score and DockQ and incorporating more discriminating physical features into PPI scoring functions.

Assessment of scoring functions for computational models of protein-protein interfaces

TL;DR

The paper benchmarks seven PPI scoring functions on uniformly sampled rigid-body re-docked heterodimers, quantifying how scores correlate with DockQ across targets and datasets. It identifies two physical interface features—interface contact count Nc and separability S—that strongly influence scoring difficulty, and demonstrates that a two-feature SVR using these features matches or exceeds existing scoring functions. It shows that sampling in DockQ is crucial for fair evaluation and that scoring performance degrades as monomers move away from bound conformations, highlighting limitations in flexible docking. The authors advocate incorporating physically discriminative features into scoring models to improve PPI predictions and CAPRI-style assessments, and suggest future work on physics-informed learning and integration with GNNs.

Abstract

A goal of computational studies of protein-protein interfaces (PPIs) is to predict the binding site between two monomers that form a heterodimer. The simplest version of this problem is to rigidly re-dock the bound forms of the monomers, which involves generating computational models of the heterodimer and then scoring them to determine the most native-like models. Scoring functions have been assessed previously using rank- and classification-based metrics, however, these methods are sensitive to the number and quality of models in the scoring function training set. We assess the accuracy of seven PPI scoring functions by comparing their scores to a measure of structural similarity to the x-ray crystal structure (i.e. the DockQ score) for a non-redundant set of heterodimers from the Protein Data Bank. For each heterodimer, we generate re-docked models uniformly sampled over DockQ and calculate the Spearman correlation between the PPI scores and DockQ. For some targets, the scores and DockQ are highly correlated; however, for many targets, there are weak correlations. Several physical features can explain the difference between difficult- and easy-to-score targets. For example, strong correlations exist between the score and DockQ for targets with highly intertwined monomers and many interface contacts. We also develop a new score based on only three physical features that matches or exceeds the performance of current PPI scoring functions. These results emphasize that PPI prediction can be improved by focusing on correlations between the PPI score and DockQ and incorporating more discriminating physical features into PPI scoring functions.
Paper Structure (17 sections, 12 equations, 14 figures, 3 tables)

This paper contains 17 sections, 12 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Visualization of (a) a heterodimer x-ray crystal structure (PDB ID: 1ATN), and (b) a "high" CAPRI quality rigid-body re-docked model and (c) an "incorrect" CAPRI quality rigid-body re-docked model for this target. The blue and green colors indicate the receptor and ligand of the heterodimer, respectively, and the DockQ scores are provided for the target and models.
  • Figure 2: The effective hit rate fraction $h_k^R$ is plotted versus the maximum considered rank $R$ of the models ordered according to their Rosetta score for increasing numbers of computational models $N_m$ (from blue to yellow) for (a) 3RCZ and (b) 3YGS. The insets show the probability distribution $P({\rm DockQ},{\rm Rosetta})$ for the computational models to have given DockQ and Rosetta scores when exhaustively sampling the computational models for heterodimer targets 3RCZ and 3YGS. The color scale from light yellow to dark red indicates increasing probability. (c) Scatterplot of ZRank2 versus DockQ for PDB 3RCZ with Spearman correlation $\rho = -0.303$. (d) ROC curve for the data in (c) with true positive cutoff DockQ $\geq 0.23$ and ${\rm AUC} = 0.825$. (e) The Spearman correlation $\rho$ between DockQ and each of the seven scoring functions plotted versus the AUC for all targets. Each color represents a different target and the blue dotted line represents ${\rm AUC}=-0.5\rho+0.5$.
  • Figure 3: Improvements in the evaluation of PPI scoring functions due to sampling models uniformly in DockQ. (a) Probability distribution $P({\rm DockQ})$ of DockQ for computational models for all targets obtained by exhaustive sampling of models (gray circles), uniformly sampling DockQ before energy minimization (blue triangles), and uniformly sampling DockQ after energy minimization (black crosses). (b) ${\rm AUC}$ plotted versus $\rho$ for models that uniformly sample DockQ (after energy minimization) for each scoring function. The blue dotted line gives ${\rm AUC}=-0.5\rho+0.5$. (c) $\Delta \rho = \rho_{u} - \rho_{e}$ plotted versus $\Delta {\rm AUC} = {\rm AUC}_u - {\rm AUC}_e$, where $\rho_{u}$ ($\rho_e$) is the Spearman correlation for the models that were uniformly (exhaustively) sampled in DockQ and ${\rm AUC}_u$ (${\rm AUC}_e)$ is the area under the ROC curve for the models that were uniformly (exhaustively) sampled.
  • Figure 4: The Spearman correlation $\rho$ between DockQ and PPI score for each target and scoring function. We order the targets by increasing $\langle \rho \rangle$ (black line), averaged over the six scoring functions listed, from left to right. (Note that since $\rho < 0$ increasing $\rho$ implies decreasing the magnitude of the correlations.) The standard deviations of the scores for each target are indicated by the black vertical lines. The numbers on the horizontal axis correspond to PPI targets listed in Table \ref{['tab:pdb_labels']} in Appendix \ref{['app:a']}.
  • Figure 5: The shape of the DockQ landscape provides insight into the effectiveness of scoring computational models. (a) A schematic of rigid-body re-docked models of heterodimer PDB 2GRN, which illustrates the variables that define the DockQ landscape. The receptor (gray shading) is located at the origin $O$ with the same orientation as that in the x-ray crystal structure. The location of the ligand with the same orientation as that in the x-ray crystal structure is denoted using spherical coordinates, the distance from the origin $r$, the polar angle $\theta$, and the azimuthal angle $\phi$, for two models (blue triangle and purple square) and the x-ray crystal structure (red star). DockQ increasing from dark blue to dark red is plotted for all computational models for (b) 2GRN and (c) 3WHQ with Spearman correlations between DockQ and ZRank2 $\langle \rho \rangle = -0.41$ and $-0.93$, respectively. The models were scored at arbitrary $x$-, $y$-, and $z$-coordinates, but plotted at $r=R$, where $R$ is the separation between the centers of mass of the receptor and ligand in the x-ray crystal structure. The white "x" denotes the location of the ligand in the x-ray crystal structure for 3WHQ. (d) The Spearman correlation $\langle \rho\rangle$ (between DockQ and PPI score) averaged over scoring functions plotted versus the relative anisotropy $\kappa^2$ of the DockQ landscape for all $84$ targets. The values of $\rho$ and $\kappa^2$ are highlighted for 2GRN (cross) and 3WHQ (star).
  • ...and 9 more figures