Table of Contents
Fetching ...

QuickBind: A Light-Weight And Interpretable Molecular Docking Model

Wojtek Treyde, Seohyun Chris Kim, Nazim Bouatta, Mohammed AlQuraishi

TL;DR

The mechanistic basis by which QuickBind makes predictions is investigated and it is found that it has learned key physicochemical properties of molecular docking, providing new insights into how machine learning models generate protein-ligand poses.

Abstract

Predicting a ligand's bound pose to a target protein is a key component of early-stage computational drug discovery. Recent developments in machine learning methods have focused on improving pose quality at the cost of model runtime. For high-throughput virtual screening applications, this exposes a capability gap that can be filled by moderately accurate but fast pose prediction. To this end, we developed QuickBind, a light-weight pose prediction algorithm. We assess QuickBind on widely used benchmarks and find that it provides an attractive trade-off between model accuracy and runtime. To facilitate virtual screening applications, we augment QuickBind with a binding affinity module and demonstrate its capabilities for multiple clinically-relevant drug targets. Finally, we investigate the mechanistic basis by which QuickBind makes predictions and find that it has learned key physicochemical properties of molecular docking, providing new insights into how machine learning models generate protein-ligand poses. By virtue of its simplicity, QuickBind can serve as both an effective virtual screening tool and a minimal test bed for exploring new model architectures and innovations. Model code and weights are available at https://github.com/aqlaboratory/QuickBind .

QuickBind: A Light-Weight And Interpretable Molecular Docking Model

TL;DR

The mechanistic basis by which QuickBind makes predictions is investigated and it is found that it has learned key physicochemical properties of molecular docking, providing new insights into how machine learning models generate protein-ligand poses.

Abstract

Predicting a ligand's bound pose to a target protein is a key component of early-stage computational drug discovery. Recent developments in machine learning methods have focused on improving pose quality at the cost of model runtime. For high-throughput virtual screening applications, this exposes a capability gap that can be filled by moderately accurate but fast pose prediction. To this end, we developed QuickBind, a light-weight pose prediction algorithm. We assess QuickBind on widely used benchmarks and find that it provides an attractive trade-off between model accuracy and runtime. To facilitate virtual screening applications, we augment QuickBind with a binding affinity module and demonstrate its capabilities for multiple clinically-relevant drug targets. Finally, we investigate the mechanistic basis by which QuickBind makes predictions and find that it has learned key physicochemical properties of molecular docking, providing new insights into how machine learning models generate protein-ligand poses. By virtue of its simplicity, QuickBind can serve as both an effective virtual screening tool and a minimal test bed for exploring new model architectures and innovations. Model code and weights are available at https://github.com/aqlaboratory/QuickBind .

Paper Structure

This paper contains 22 sections, 11 figures, 4 tables, 5 algorithms.

Figures (11)

  • Figure 1: QuickBind architecture. A "single" representation is first constructed by concatenating embedded protein and ligand input features. A "pair" representation is then constructed from linear embeddings of the single representation, pairwise distances (of protein residues and ligands atoms, independently), relative positional encodings of protein residues, and the adjacency matrix of ligand atoms. The pair representation contains a protein and a ligand block, as well as mixed off-diagonal elements. The single and the pair representations are passed through a modified Evoformer stack, before the Structure module uses the updated single and pair representations as well as initial coordinates from an RDKit conformer rdkit and protein coordinates from the input protein structure to dock the ligand into the binding pocket.
  • Figure 2: Success rates vs. average runtimes (summed over all complexes) for ML-based rigid docking methods on the PDBBind test set. Success rates are reported separately for all 363 complexes (circles) and for 144 complexes whose proteins are absent from the training and validation sets (diamonds). TANKBind was only evaluated on a subset of 142 unseen proteins by its original authors. Success rates are taken from original publications, except for NeuralPLexer's success rate on unseen proteins as it was not originally reported. Runtimes do not include preprocessing and were determined on NVIDIA A40 GPUs using scripts provided in each method's respective repository, without batching. NeuralPLexer runtime excludes acquisition of auxiliary inputs (e.g.,AF2 predictions) while TANKBind and E3Bind runtimes do not include P2Rank segmentation. E3Bind's runtime was taken from its original publication since authors did not release the model weights and inference code.
  • Figure 3: Success rates of ML-based rigid docking and co-folding models on the PB Benchmark, sorted in ascending order by model runtime (left-to-right). Target 7M31 was omitted for QuickBind because it is longer than 2,000 residues. Lighter colors correspond to all predictions while darker colors correspond to PB-valid predictions (docking methods only) and hatched bars to results after energy minimization. Success rates are taken from original publications RFAAUMolAF3Buttenschoen.2023, except for FABind's success rate as it was not originally reported. We found the EquiBind success rates to be 0.9% (all predictions), 0.2% (PB-valid predictions), 4.2% (all predictions after energy minimization), and 3.5% (PB-valid predictions after energy minimization), in rough agreement with the originally reported values (2.6%, 0.0%, 5.5%, and 4.8%, respectively).
  • Figure 4: Interpretable physicochemical properties in QuickBind's ligand representation. Processing the Evoformer's single representation into separate protein and ligand representations followed by averaging the ligand's atom dimension yields interpretable descriptors that correlate with channel values, including number of H-bond acceptors and donors, total hydrophobic surface area, and number of rotatable bonds.
  • Figure SI.1: Three examples of QuickBind predictions and their RMSDs, randomly chosen from 100 lowest-RMSD predictions on the PDBBind test set. The ground-truth ligand is shown in red, the QuickBind prediction is shown in blue.
  • ...and 6 more figures