Table of Contents
Fetching ...

CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design

Zijun Gao, Mutian He, Shijia Sun, Hanqun Cao, Jingjie Zhang, Zihao Luo, Xiaorui Wang, Xiaojun Yao, Chang-Yu Hsieh, Chunbin Gu, Pheng Ann Heng

TL;DR

This work tackles hallucinations in state-of-the-art diffusion-based structure predictors by introducing CODE, a self-evaluating metric that tracks topological frustration via diffusion-embedding trajectories, and CONFIDE, an integrated score combining topological and energetic perspectives. CODE correlates strongly with folding kinetics, while CONFIDE outperforms pLDDT across diverse benchmarks, including ternary complexes, flexible proteins, and PDB/CASP15 datasets, and shows practical gains in binder design, enzyme-site mapping, resistance prediction, and aptamer screening. By providing unsupervised, interpretable tools that capture both energy and topology constraints, CODE and CONFIDE offer a robust framework to improve the reliability and applicability of biomolecular structure predictions in drug discovery and structural biology. The methods demonstrate broad applicability across design, screening, and functional annotation tasks, suggesting a new paradigm for unsupervised evaluation of diffusion-based biomolecular models. These contributions enable more reliable structure predictions, enhanced design strategies, and accelerated discovery workflows in computational biology.

Abstract

Reliable evaluation of protein structure predictions remains challenging, as metrics like pLDDT capture energetic stability but often miss subtle errors such as atomic clashes or conformational traps reflecting topological frustration within the protein folding energy landscape. We present CODE (Chain of Diffusion Embeddings), a self evaluating metric empirically found to quantify topological frustration directly from the latent diffusion embeddings of the AlphaFold3 series of structure predictors in a fully unsupervised manner. Integrating this with pLDDT, we propose CONFIDE, a unified evaluation framework that combines energetic and topological perspectives to improve the reliability of AlphaFold3 and related models. CODE strongly correlates with protein folding rates driven by topological frustration, achieving a correlation of 0.82 compared to pLDDT's 0.33 (a relative improvement of 148\%). CONFIDE significantly enhances the reliability of quality evaluation in molecular glue structure prediction benchmarks, achieving a Spearman correlation of 0.73 with RMSD, compared to pLDDT's correlation of 0.42, a relative improvement of 73.8\%. Beyond quality assessment, our approach applies to diverse drug design tasks, including all-atom binder design, enzymatic active site mapping, mutation induced binding affinity prediction, nucleic acid aptamer screening, and flexible protein modeling. By combining data driven embeddings with theoretical insight, CODE and CONFIDE outperform existing metrics across a wide range of biomolecular systems, offering robust and versatile tools to refine structure predictions, advance structural biology, and accelerate drug discovery.

CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design

TL;DR

This work tackles hallucinations in state-of-the-art diffusion-based structure predictors by introducing CODE, a self-evaluating metric that tracks topological frustration via diffusion-embedding trajectories, and CONFIDE, an integrated score combining topological and energetic perspectives. CODE correlates strongly with folding kinetics, while CONFIDE outperforms pLDDT across diverse benchmarks, including ternary complexes, flexible proteins, and PDB/CASP15 datasets, and shows practical gains in binder design, enzyme-site mapping, resistance prediction, and aptamer screening. By providing unsupervised, interpretable tools that capture both energy and topology constraints, CODE and CONFIDE offer a robust framework to improve the reliability and applicability of biomolecular structure predictions in drug discovery and structural biology. The methods demonstrate broad applicability across design, screening, and functional annotation tasks, suggesting a new paradigm for unsupervised evaluation of diffusion-based biomolecular models. These contributions enable more reliable structure predictions, enhanced design strategies, and accelerated discovery workflows in computational biology.

Abstract

Reliable evaluation of protein structure predictions remains challenging, as metrics like pLDDT capture energetic stability but often miss subtle errors such as atomic clashes or conformational traps reflecting topological frustration within the protein folding energy landscape. We present CODE (Chain of Diffusion Embeddings), a self evaluating metric empirically found to quantify topological frustration directly from the latent diffusion embeddings of the AlphaFold3 series of structure predictors in a fully unsupervised manner. Integrating this with pLDDT, we propose CONFIDE, a unified evaluation framework that combines energetic and topological perspectives to improve the reliability of AlphaFold3 and related models. CODE strongly correlates with protein folding rates driven by topological frustration, achieving a correlation of 0.82 compared to pLDDT's 0.33 (a relative improvement of 148\%). CONFIDE significantly enhances the reliability of quality evaluation in molecular glue structure prediction benchmarks, achieving a Spearman correlation of 0.73 with RMSD, compared to pLDDT's correlation of 0.42, a relative improvement of 73.8\%. Beyond quality assessment, our approach applies to diverse drug design tasks, including all-atom binder design, enzymatic active site mapping, mutation induced binding affinity prediction, nucleic acid aptamer screening, and flexible protein modeling. By combining data driven embeddings with theoretical insight, CODE and CONFIDE outperform existing metrics across a wide range of biomolecular systems, offering robust and versatile tools to refine structure predictions, advance structural biology, and accelerate drug discovery.

Paper Structure

This paper contains 37 sections, 17 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Overview of CODE and CONFIDE. (a) The computational process of the CODE module and the CONFIDE module. First, the sequence of any biological molecule is input to the Alphafold3 series model to obtain the PairFormer embedding. Then, the embedding guides the noise to generate the corresponding structure. We construct the embedding change trajectory in the Denoising Module and calculate the CODE score to characterize topological frustration. The generated structure is used as the input of the Confidence Module to obtain the confidence score to represent energy frustration. Finally, the CONFIDE Module comprehensively considers the two frustrations and analyzes to obtain a more comprehensive CONFIDE score. (b) Shows the application scenarios of CODE and CONFIDE. In the hallucination detection of structure prediction, we verified 6 challenging datasets covering almost all biological molecules. We also expanded the evaluation to five application scenarios covering three types of tasks: binder design, enzyme site detection, and complex screening. (c) Shows the workflow of CODE application to hallucination-based binder design. (d) Shows the schematic diagram of CODE for residue level site detection. (e) Shows the schematic diagram of CODE/CONFIDE for virtual screening
  • Figure 1: The predicted structures of the remaining four proteins in the main text.
  • Figure 2: CODE implicitly encodes the protein folding dynamics mediated by topological frustration. (a) Three proteins with different orders of magnitude of folding rates are shown. AcP has mild mixing entropy early in the folding process. A large bottleneck region slows the folding of this protein around the transition-state barrier. SH3 is a slower folding protein and shows little mild mixing entropy initially, and then a small bump appears in the route measure curve. Psbd is a very fast folding protein and shows almost constant mixing entropy measure throughout folding. (b) Correlation analysis among CODE, pLDDT and log V with linear regression fits and 95% confidence intervals. The first figure shows a strong Spearman correlation between CODE and log V. The second one shows a weak Spearman correlation between pLDDT and log V. The third one shows that there is only a weak correlation of -0.37 between CODE and pLDDT, indicating that they depict the energy landscape of protein folding from different perspective. (c) Protein structures with fast folding rates, showing their names and specific folding rates. (d) Protein structures with intermediate folding rates, showing their names and specific folding rates. (e) Protein structures with slower folding rates, showing their names and specific folding rates.
  • Figure 2: Performance analysis on PROTAC structure prediction. (a) Radar chart of the five evaluation metrics of pLDDT, CODE and CONFIDE. (b) Histogram of classification evaluation metrics of pLDDT, CODE and CONFIDE. (c) Histogram of correlation evaluation metrics of pLDDT, CODE and CONFIDE. (d) Spearman correlation between the number of atomic conflicts and pLDDT/CODE. (e) Scatter plots showing the fit and distribution of pLDDT, CODE, and CONFIDE against RMSD, including linear regression fits and 95% confidence intervals, as well as AUROC for different classification thresholds. (f) Scatter plots of the fit and distribution of pLDDT and CODE with RMSD, incorporating linear regression fits and 95% confidence intervals. (g-h) The true structure, predicted structure, and predicted structure colored by pLDDT of 7JTP and 6BN7 in PROTAC are shown from left to right. The red balls in the middle figure indicate atomic conflicts.
  • Figure 3: Performance analysis on ternary complex structure prediction. (a) Radar chart of the five evaluation metrics of three scores (pLDDT, CODE and CONFIDE). (b) Histogram of classification evaluation metrics of three scores. (c) Histogram of correlation evaluation metrics of three scores. (d) Spearman correlation between the number of atomic conflicts and pLDDT/CODE. The ternary complex structure predicted by Boltz1 has physically abnormal local conformations, such as steric conflicts caused by atoms being less than 1.5 angstroms apart in space. CODE can capture the high topological frustration caused by these conformations, thus significantly improving the performance of both systems compared to pLDDT. (e) Scatter plots of the fit and distribution of pLDDT, CODE and CONFIDE with RMSD, including linear regression lines with 95% confidence intervals, as well as AUROC for different classification thresholds. (f) Scatter plots of the fit and distribution of pLDDT and CODE with the number of atom clashes, including linear regression lines with 95% confidence intervals. (g) The true structure, predicted structure, and predicted structure colored by pLDDT of 6M93 in MGD are shown from left to right. The red balls in the middle figure indicate atomic conflicts.
  • ...and 7 more figures