On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction

Shunzhou Wan; Xibei Zhang; Xiao Xue; Peter V. Coveney

On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction

Shunzhou Wan, Xibei Zhang, Xiao Xue, Peter V. Coveney

TL;DR

It is shown that while Boltz-2 offers substantial speed for initial screening, it lacks the energetic resolution required for lead identification, highlighting the necessity of employing physics-based methods for the reliability and refinement of AI-derived models.

Abstract

Despite continuing hype about the role of AI in drug discovery, no "AI-discovered drugs" have so far received regulatory approval. Here we assess one of the latest AI based tools in this domain. The ability to rapidly predict protein-ligand structures and binding affinities is pivotal for accelerating drug discovery. Boltz-2, a recently developed biomolecular foundation model, aims to bridge the gap between AI efficiency and physics-based precision through a joint "co-folding" approach. In this study, we provide an extensive evaluation of Boltz-2 using two large-scale datasets: 16,780 compounds for 3CLPro and 21,702 compounds for TNKS2. We compare Boltz-2 predicted structures with traditional docking and binding affinities with binding free energies derived from the physics-based ESMACS protocol. Structural analysis reveals significant global RMSD variations, indicating that Boltz-2 predicts multiple protein conformations and ligand binding positions rather than a single converged pose. Energetic evaluations exhibit only weak to moderate correlations across the global datasets. Furthermore, a focused analysis of the top 100 compounds yields no significant correlation between the Boltz-2 predictions and the binding free energies from fine-grained ESMACS, alongside observed saturation difference in ligand structures. Our results show that while Boltz-2 offers substantial speed for initial screening, it lacks the energetic resolution required for lead identification. These findings highlight the necessity of employing physics-based methods for the reliability and refinement of AI-derived models.

On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction

TL;DR

Abstract

Paper Structure (19 sections, 7 figures)

This paper contains 19 sections, 7 figures.

Introduction
Results
Global Evaluation of Structure and Binding Affinity
Structural comparison
Protein structure comparison
Ligand binding pose comparison
Reliability estimation in binding pose from Boltz-2
Boltz-2 predicted binding affinities
Correlation of predicted binding affinity between ESMACS and Boltz-2
Reproducibility of Boltz-2
Precision Analysis of Top-100 Boltz-2 Predictions
Binding poses from Boltz-2
Binding free energies
Discussion
Methods
...and 4 more sections

Figures (7)

Figure 1: Structural comparison of Boltz-2 and docking predictions for 3CLPro (a-c) and TNKS2 (d-f). Distributions are shown for pairwise RMSDs of the protein structures (a, d) and ligand poses (b, e), alongside LDDT scores (c, f) evaluating the preservation of the local atomic environment and binding site interactions.
Figure 2: Representative binding sites predicted by Boltz-2 co-folding for a) 3CLPro and b) TNKS2. The protein (cartoon) and a bound compound (yellow sticks) are from PDB 6W63 and 4UI5, respectively. Five binding sites (i-vi) are predicted for 3CLPro, with LDDT metric values of 1.8 Å, 6.4 Å, 7.8 Å, 13.0 Å, 13.2 Å and 14.3 Å (Fig. \ref{['fig:rmsd_lddt']}c), respectively. For TNKS2, all but one compounds bind at or near the x-ray identified binding site (i, ii), with the exception at a different site (iii), with LDDT values (Fig. \ref{['fig:rmsd_lddt']}f) of 0.8 Å, 3.6 Å and 6.9 Å, respectively.
Figure 3: The distribution of confidence scores from the Boltz-2 predictions of both 3CLPro and TNKS2 systems. Percentages are calculated relative to the total number of ligands in each protein system.
Figure 4: Correlation between binding free energies predicted by Boltz-2 ($\Delta G_{\text{Boltz}}$) and calculated via ESMACS ($\Delta G_{\text{ESMACS}}$) for a) 3CLPro and b) TNKS2. Dashed lines indicate standardised major axis (SMA) regressions. Data points are coloured by Boltz-2 predicted binding probabilities.
Figure 5: Binding affinity consistency between two independent runs of Boltz-2 for a) 3CLPro and b) TNKS2.
...and 2 more figures

On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction

TL;DR

Abstract

On the Reliability of AI Methods in Drug Discovery: Evaluation of Boltz-2 for Structure and Binding Affinity Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (7)