From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics
Bowen Gao, Haichuan Tan, Yanwen Huang, Minsi Ren, Xiao Huang, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan
TL;DR
This work tackles the gap between theoretically strong structure-based drug design (SBDD) outputs and their real-world utility by shifting evaluation from sole reliance on $Vina$ docking scores to a three-tier framework: binding affinity estimation, similarity to known actives and FDA-approved drugs, and virtual screening ability. It introduces detailed metrics, including $\Delta$ score, DrugCLIP, Active/FDA Similarity, and BEDROC/$\mathrm{EF}$, and constructs a realistic benchmark from real crystal structures (PDBbind) with rigorous pocket-diversity and a test set drawn from DUD-E and LIT-PCBA. Comprehensive experiments across five baselines (LiGAN, AR, Pocket2Mol, TargetDiff, MolCRAFT) reveal that high docking scores often do not translate to practical usefulness, highlighting a substantial gap between theory and deployment. The proposed dataset and metrics aim to guide future SBDD models toward outputs that are not only score-competitive but also synthesizable, drug-like, and reusable in virtual screening, thereby accelerating the pathway from theory to therapy.
Abstract
Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability of the Vina docking score, the current standard for assessing binding abilities, is increasingly questioned due to its susceptibility to overfitting. To address these limitations, we propose a comprehensive evaluation framework that includes assessing the similarity of generated molecules to known active compounds, introducing a virtual screening-based metric for practical deployment capabilities, and re-evaluating binding affinity more rigorously. Our experiments reveal that while current SBDD models achieve high Vina scores, they fall short in practical usability metrics, highlighting a significant gap between theoretical predictions and real-world applicability. Our proposed metrics and dataset aim to bridge this gap, enhancing the practical applicability of future SBDD models and aligning them more closely with the needs of pharmaceutical research and development.
