Deep Confident Steps to New Pockets: Strategies for Docking Generalization

Gabriele Corso; Arthur Deng; Benjamin Fry; Nicholas Polizzi; Regina Barzilay; Tommi Jaakkola

Deep Confident Steps to New Pockets: Strategies for Docking Generalization

Gabriele Corso, Arthur Deng, Benjamin Fry, Nicholas Polizzi, Regina Barzilay, Tommi Jaakkola

TL;DR

DockGen is developed, a new benchmark based on the ligand-binding domains of proteins, and Confidence Bootstrapping is proposed, a new training paradigm that solely relies on the interaction between diffusion and confidence models and exploits the multi-resolution generation process of diffusion models.

Abstract

Accurate blind docking has the potential to lead to new biological breakthroughs, but for this promise to be realized, docking methods must generalize well across the proteome. Existing benchmarks, however, fail to rigorously assess generalizability. Therefore, we develop DockGen, a new benchmark based on the ligand-binding domains of proteins, and we show that existing machine learning-based docking models have very weak generalization abilities. We carefully analyze the scaling laws of ML-based docking and show that, by scaling data and model size, as well as integrating synthetic data strategies, we are able to significantly increase the generalization capacity and set new state-of-the-art performance across benchmarks. Further, we propose Confidence Bootstrapping, a new training paradigm that solely relies on the interaction between diffusion and confidence models and exploits the multi-resolution generation process of diffusion models. We demonstrate that Confidence Bootstrapping significantly improves the ability of ML-based docking methods to dock to unseen protein classes, edging closer to accurate and generalizable blind docking methods.

Deep Confident Steps to New Pockets: Strategies for Docking Generalization

TL;DR

Abstract

Paper Structure (45 sections, 7 equations, 8 figures, 2 tables)

This paper contains 45 sections, 7 equations, 8 figures, 2 tables.

Introduction
Related work
Search-based docking
ML-based docking
Blind docking benchmarks
The DockGen Benchmark
Confidence Bootstrapping
Background
Diffusion models
Self-training methods
Method
Formalization
Experiments
Analyzing docking scaling laws
Increasing the training data
...and 30 more sections

Figures (8)

Figure 1: Visual representation of the Confidence Bootstrapping training scheme. The dashed lines represent the reverse diffusion generation rollouts that the model executes. The dotted lines illustrate the bootstrapping feedback from the confidence model that is used to update the likelihood of the early diffusion steps by changing the weights of the score model. The pink regions of the protein represent areas to where the docking algorithm is still attending, which starts from being the whole protein and then gradually narrows to the local environment around the current pose.
Figure 2: A. An example of the superimposition of the pockets of two proteins in PDBBind, 1QXZ in pink and 5M4Q in cyan, that share a very similar binding pocket structure (a bound ligand is shown in red), but have only 22% sequence similarity. While sequence similarity splits would classify them in separate clusters, our approach correctly identifies that the binding domain of these two proteins is the same. B. Comparison of binding sites in train vs test set for both PDBbind and DockGen datasets. BLOSUM62 and harmonic mean similarity metrics (more details in Appendix \ref{['app:dataset_analysis']}) have a maximum of 1 (most similar) and a minimum of 0 (least similar). The densities are clipped at 1% of the maximum value for both datasets to emphasize contamination. Every binding site in the train set was compared to every binding site in the test set showing significantly higher train-test similarity in the PDBBind dataset compared to the DockGen dataset.
Figure 3: Analysis of the scaling laws of DiffDock when measuring its ability to generalize to unseen protein domains. par indicates the number of parameters and the different colors indicate different training sets and augmentations. For the 30M architecture, only one model was trained due to its expensive training cost.
Figure 4: Empirical performance of Confidence Bootstrapping across the 8 protein domain clusters within DockGen-cluster. We did two fine-tuning runs for each cluster and report the averaged results. All performances are measured based on top-1 pose when taking 8 samples with the fine-tuned models. A. Median confidence of sampled points over at every iteration. B. Proportion of top-1 predictions below 2Å over the course of the iterations for each cluster. C. Performance for each cluster before the fine-tuning and after the K=60 steps of Confidence Bootstrapping. D. Aggregated performance along the fine-tuning for all the clusters weighted by their count with, as references, the performance of some of the baselines on the same set.
Figure 5: Comparison of the sizes of the ligand and the receptor between the test sets of PDBBind and DockGen.
...and 3 more figures

Deep Confident Steps to New Pockets: Strategies for Docking Generalization

TL;DR

Abstract

Deep Confident Steps to New Pockets: Strategies for Docking Generalization

Authors

TL;DR

Abstract

Table of Contents

Figures (8)