Table of Contents
Fetching ...

Towards Precision Protein-Ligand Affinity Prediction Benchmark: A Complete and Modification-Aware DAVIS Dataset

Ming-Hsiu Wu, Ziqian Xie, Shuiwang Ji, Degui Zhi

TL;DR

This work addresses the gap in protein–ligand affinity prediction under biologically realistic conditions by creating DAVIS-complete, a modification-aware extension of the standard DAVIS dataset, and proposing three benchmarks to gauge model robustness to protein alterations. It systematically compares docking-free and docking-based approaches, showing docking-based models generalize better in zero-shot scenarios while docking-free models excel only when modified examples are scarce but can improve with few-shot fine-tuning. The findings highlight persistent generalization gaps and the potential of targeted fine-tuning to mitigate overfitting to wild-type proteins. The curated resource and benchmarks aim to drive development of more generalizable affinity predictors, with implications for precision medicine and drug discovery.

Abstract

Advancements in AI for science unlocks capabilities for critical drug discovery tasks such as protein-ligand binding affinity prediction. However, current models overfit to existing oversimplified datasets that does not represent naturally occurring and biologically relevant proteins with modifications. In this work, we curate a complete and modification-aware version of the widely used DAVIS dataset by incorporating 4,032 kinase-ligand pairs involving substitutions, insertions, deletions, and phosphorylation events. This enriched dataset enables benchmarking of predictive models under biologically realistic conditions. Based on this new dataset, we propose three benchmark settings-Augmented Dataset Prediction, Wild-Type to Modification Generalization, and Few-Shot Modification Generalization-designed to assess model robustness in the presence of protein modifications. Through extensive evaluation of both docking-free and docking-based methods, we find that docking-based model generalize better in zero-shot settings. In contrast, docking-free models tend to overfit to wild-type proteins and struggle with unseen modifications but show notable improvement when fine-tuned on a small set of modified examples. We anticipate that the curated dataset and benchmarks offer a valuable foundation for developing models that better generalize to protein modifications, ultimately advancing precision medicine in drug discovery. The benchmark is available at: https://github.com/ZhiGroup/DAVIS-complete

Towards Precision Protein-Ligand Affinity Prediction Benchmark: A Complete and Modification-Aware DAVIS Dataset

TL;DR

This work addresses the gap in protein–ligand affinity prediction under biologically realistic conditions by creating DAVIS-complete, a modification-aware extension of the standard DAVIS dataset, and proposing three benchmarks to gauge model robustness to protein alterations. It systematically compares docking-free and docking-based approaches, showing docking-based models generalize better in zero-shot scenarios while docking-free models excel only when modified examples are scarce but can improve with few-shot fine-tuning. The findings highlight persistent generalization gaps and the potential of targeted fine-tuning to mitigate overfitting to wild-type proteins. The curated resource and benchmarks aim to drive development of more generalizable affinity predictors, with implications for precision medicine and drug discovery.

Abstract

Advancements in AI for science unlocks capabilities for critical drug discovery tasks such as protein-ligand binding affinity prediction. However, current models overfit to existing oversimplified datasets that does not represent naturally occurring and biologically relevant proteins with modifications. In this work, we curate a complete and modification-aware version of the widely used DAVIS dataset by incorporating 4,032 kinase-ligand pairs involving substitutions, insertions, deletions, and phosphorylation events. This enriched dataset enables benchmarking of predictive models under biologically realistic conditions. Based on this new dataset, we propose three benchmark settings-Augmented Dataset Prediction, Wild-Type to Modification Generalization, and Few-Shot Modification Generalization-designed to assess model robustness in the presence of protein modifications. Through extensive evaluation of both docking-free and docking-based methods, we find that docking-based model generalize better in zero-shot settings. In contrast, docking-free models tend to overfit to wild-type proteins and struggle with unseen modifications but show notable improvement when fine-tuned on a small set of modified examples. We anticipate that the curated dataset and benchmarks offer a valuable foundation for developing models that better generalize to protein modifications, ultimately advancing precision medicine in drug discovery. The benchmark is available at: https://github.com/ZhiGroup/DAVIS-complete

Paper Structure

This paper contains 38 sections, 15 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: (a) DAVIS-Complete is curated by adding modified kinase protein–ligand pairs previously excluded from DAVIS-Filtered. (b) Example of dataset extension: 14 modifications of the kinase ABL1 are incorporated alongside its wild-type form. (c) Augmented Dataset Prediction benchmark: Wild-type and modified protein–ligand pairs are combined and evaluated under three main splits—new-drug, new-protein, and both-new—each with corresponding sub-splits. (d) Wild-Type to Modification Generalization benchmark: models trained on wild-type pairs are evaluated across (1) global modification generalization, (2) same-ligand different-modifications, and (3) same-modification different-ligands. (e) Few-Shot Modification Generalization: models fine-tuned on limited modified pairs to assess generalization to unseen variants.
  • Figure A1: Distribution of uncapped binding affinity ($pK_d$) for (a) all protein–ligand pairs, (b) wild-type proteins, and (c) modified proteins
  • Figure A2: Heatmap of magnitude of binding affinity change. Colors from blue to red represent either the exact magnitude or the lower bound of the change, while light green indicates untrackable changes.
  • Figure A3: Distribution of magnitude of trackable binding affinity alternation. (a) Both wild-type and modified proteins have $K_d$ values below 10$\mu$M. The affinity changes are precisely trackable. (b) Wild-type is capped ($K_d > 10\ \mu$M), while the modified protein is not. (c) Modified protein is capped, while the wild-type is not.
  • Figure A4: A case of the Wild-Type to Modification Generalization benchmark: Same-ligand, different-modifications — Staurosporine binding to various EGFR protein variants.
  • ...and 2 more figures