Table of Contents
Fetching ...

RapidDock: Unlocking Proteome-scale Molecular Docking

Rafał Powalski, Bazyli Klockiewicz, Maciej Jaśkowski, Bartosz Topolski, Paweł Dąbrowski-Tumański, Maciej Wiśniewski, Łukasz Kuciński, Piotr Miłoś, Dariusz Plewczynski

TL;DR

The key features of RapidDock are examined, including the use of relative distance embeddings of $3$D structures in attention matrices, pre-training on protein folding, and a custom loss function invariant to molecular symmetries that enable leveraging the transformer architecture for molecular docking.

Abstract

Accelerating molecular docking -- the process of predicting how molecules bind to protein targets -- could boost small-molecule drug discovery and revolutionize medicine. Unfortunately, current molecular docking tools are too slow to screen potential drugs against all relevant proteins, which often results in missed drug candidates or unexpected side effects occurring in clinical trials. To address this gap, we introduce RapidDock, an efficient transformer-based model for blind molecular docking. RapidDock achieves at least a $100 \times$ speed advantage over existing methods without compromising accuracy. On the Posebusters and DockGen benchmarks, our method achieves $52.1\%$ and $44.0\%$ success rates ($\text{RMSD}<2$Å), respectively. The average inference time is $0.04$ seconds on a single GPU, highlighting RapidDock's potential for large-scale docking studies. We examine the key features of RapidDock that enable leveraging the transformer architecture for molecular docking, including the use of relative distance embeddings of $3$D structures in attention matrices, pre-training on protein folding, and a custom loss function invariant to molecular symmetries.

RapidDock: Unlocking Proteome-scale Molecular Docking

TL;DR

The key features of RapidDock are examined, including the use of relative distance embeddings of D structures in attention matrices, pre-training on protein folding, and a custom loss function invariant to molecular symmetries that enable leveraging the transformer architecture for molecular docking.

Abstract

Accelerating molecular docking -- the process of predicting how molecules bind to protein targets -- could boost small-molecule drug discovery and revolutionize medicine. Unfortunately, current molecular docking tools are too slow to screen potential drugs against all relevant proteins, which often results in missed drug candidates or unexpected side effects occurring in clinical trials. To address this gap, we introduce RapidDock, an efficient transformer-based model for blind molecular docking. RapidDock achieves at least a speed advantage over existing methods without compromising accuracy. On the Posebusters and DockGen benchmarks, our method achieves and success rates (Å), respectively. The average inference time is seconds on a single GPU, highlighting RapidDock's potential for large-scale docking studies. We examine the key features of RapidDock that enable leveraging the transformer architecture for molecular docking, including the use of relative distance embeddings of D structures in attention matrices, pre-training on protein folding, and a custom loss function invariant to molecular symmetries.

Paper Structure

This paper contains 37 sections, 11 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: RapidDock architecture overview. The molecule is represented by a sequence of its atoms and the corresponding matrix of distances. The protein is represented by its amino acid sequence and its matrix of distances. Learnable embeddings of these distance matrices are added to the attention matrices. Additionally, the model uses ESM-2 embeddings lin2023evolutionary to improve its protein representation and embeddings of atom charges to improve its molecule representation.
  • Figure :