Dockformer: A transformer-based molecular docking paradigm for large-scale virtual screening
Zhangfan Yang, Junkai Ji, Shan He, Jianqiang Li, Tiantian He, Ruibin Bai, Zexuan Zhu, Yew Soon Ong
TL;DR
Dockformer tackles the challenge of efficient and accurate large-scale virtual screening by introducing a transformer-based, end-to-end docking framework that fuses multimodal information from 2D graph topology and 3D geometry. It employs two encoders for proteins and ligands, a binding module to model intermolecular interactions, and a structure module to directly generate ligand coordinates with accompanying confidence measures, circumventing expensive optimization or denoising steps. On standard benchmarks, Dockformer achieves high docking success rates (e.g., 90.53% on the PDBbind core set and 82.71% on PoseBusters) while offering orders-of-magnitude faster inference compared to traditional approaches and many DL-based methods. The model also demonstrates practical utility in large-scale screening, exemplified by a near-real-time assay of ~1.2 million ChEMBL compounds against M_pro, and provides confidence scores correlating with docking accuracy, supporting its deployment in accelerated drug discovery workflows.
Abstract
Molecular docking is a crucial step in drug development, which enables the virtual screening of compound libraries to identify potential ligands that target proteins of interest. However, the computational complexity of traditional docking models increases as the size of the compound library increases. Recently, deep learning algorithms can provide data-driven research and development models to increase the speed of the docking process. Unfortunately, few models can achieve superior screening performance compared to that of traditional models. Therefore, a novel deep learning-based docking approach named Dockformer is introduced in this study. Dockformer leverages multimodal information to capture the geometric topology and structural knowledge of molecules and can directly generate binding conformations with the corresponding confidence measures in an end-to-end manner. The experimental results show that Dockformer achieves success rates of 90.53% and 82.71% on the PDBbind core set and PoseBusters benchmarks, respectively, and more than a 100-fold increase in the inference process speed, outperforming almost all state-of-the-art docking methods. In addition, the ability of Dockformer to identify the main protease inhibitors of coronaviruses is demonstrated in a real-world virtual screening scenario. Considering its high docking accuracy and screening efficiency, Dockformer can be regarded as a powerful and robust tool in the field of drug design.
