Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?
Kangyu Zheng, Yingzhou Lu, Zaixi Zhang, Zhongwei Wan, Yao Ma, Marinka Zitnik, Tianfan Fu
TL;DR
This paper addresses the need for a cross-algorithm benchmark in structure-based drug design by evaluating 16 models spanning 1D, 2D, and 3D design paradigms on both molecular properties and docking affinities to seven target proteins. It demonstrates that 1D/2D ligand-centric approaches, when treated with docking as a black-box oracle, can match or exceed 3D methods, with AutoGrow4 (a 2D genetic algorithm) achieving the strongest overall optimization performance. Across the results, no single method dominates all metrics (docking, drug-likeness, synthesizability, and generative quality); nonetheless, genetic algorithms, particularly AutoGrow4, show robust docking performance and favorable synthetic accessibility. The findings advocate for hybrid SBDD designs that integrate GA with other computational strategies to optimize both binding affinity and molecular properties, highlighting practical implications for accelerating drug discovery and informing future benchmark design.
Abstract
Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the performance of sixteen models across these different algorithmic foundations by assessing the pharmaceutical properties of the generated molecules and their docking affinities with specified target proteins. We highlight the unique advantages of each algorithmic approach and offer recommendations for the design of future SBDD models. We emphasize that 1D/2D ligand-centric drug design methods can be used in SBDD by treating the docking function as a black-box oracle, which is typically neglected. The empirical results show that 1D/2D methods achieve competitive performance compared with 3D-based methods that use the 3D structure of the target protein explicitly. Also, AutoGrow4, a 2D molecular graph-based genetic algorithm, dominates SBDD in terms of optimization ability. The relevant code is available in https://github.com/zkysfls/2024-sbdd-benchmark.
