Table of Contents
Fetching ...

Benchmark on Drug Target Interaction Modeling from a Drug Structure Perspective

Xinnan Zhang, Jialin Wu, Junyi Xie, Tianlong Chen, Kaixiong Zhou

TL;DR

This work tackles the challenge of fairly benchmarking drug–target interaction (DTI) modeling methods that leverage drug structural information. It introduces GTB-DTI, a comprehensive benchmark comparing explicit graph neural network (GNN) and implicit Transformer drug encoders, across six datasets and with optimized hyperparameters to reveal robust encoder–featurization patterns. Macroscopically, it uncovers task-dependent performance trends (e.g., pretrained protein language models boosting classification) and the value of carefully chosen featurizations; microscopically, it evaluates 31 models to identify design principles and efficiency gains, including memory and convergence benefits. The study culminates in a best-performing combo that achieves state-of-the-art regression results with lower memory and faster convergence, providing a principled baseline to guide future DTI research.

Abstract

The prediction modeling of drug-target interactions is crucial to drug discovery and design, which has seen rapid advancements owing to deep learning technologies. Recently developed methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets by effectively extracting structural information. However, the benchmarking of these novel methods often varies significantly in terms of hyperparameter settings and datasets, which limits algorithmic progress. In view of these, we conducted a comprehensive survey and benchmark for drug-target interaction modeling from a structural perspective via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms. We conducted a macroscopical comparison between these two classes of encoding strategies as well as the different featurization techniques that inform molecules' chemical and physical properties. We then carry out the microscopical comparison between all the integrated models across the six datasets via comprehensively benchmarking their effectiveness and efficiency. To ensure fairness, we investigate model performance under individually optimized configuration. Remarkably, the summarized insights from the benchmark studies lead to the design of model combos. We demonstrate that our combos can achieve new state-of-the-art performance on various datasets associated with cost-effective memory and computation.

Benchmark on Drug Target Interaction Modeling from a Drug Structure Perspective

TL;DR

This work tackles the challenge of fairly benchmarking drug–target interaction (DTI) modeling methods that leverage drug structural information. It introduces GTB-DTI, a comprehensive benchmark comparing explicit graph neural network (GNN) and implicit Transformer drug encoders, across six datasets and with optimized hyperparameters to reveal robust encoder–featurization patterns. Macroscopically, it uncovers task-dependent performance trends (e.g., pretrained protein language models boosting classification) and the value of carefully chosen featurizations; microscopically, it evaluates 31 models to identify design principles and efficiency gains, including memory and convergence benefits. The study culminates in a best-performing combo that achieves state-of-the-art regression results with lower memory and faster convergence, providing a principled baseline to guide future DTI research.

Abstract

The prediction modeling of drug-target interactions is crucial to drug discovery and design, which has seen rapid advancements owing to deep learning technologies. Recently developed methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets by effectively extracting structural information. However, the benchmarking of these novel methods often varies significantly in terms of hyperparameter settings and datasets, which limits algorithmic progress. In view of these, we conducted a comprehensive survey and benchmark for drug-target interaction modeling from a structural perspective via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms. We conducted a macroscopical comparison between these two classes of encoding strategies as well as the different featurization techniques that inform molecules' chemical and physical properties. We then carry out the microscopical comparison between all the integrated models across the six datasets via comprehensively benchmarking their effectiveness and efficiency. To ensure fairness, we investigate model performance under individually optimized configuration. Remarkably, the summarized insights from the benchmark studies lead to the design of model combos. We demonstrate that our combos can achieve new state-of-the-art performance on various datasets associated with cost-effective memory and computation.
Paper Structure (25 sections, 6 equations, 5 figures, 10 tables)

This paper contains 25 sections, 6 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Comparison of different encoding strategies with early stop mechanism for drugs and proteins when the total epoch is 1000, LR is 0.0005, BS is 512, and DR is 0.2. Trans is a Transformer-based model, which is composed of two parts: embedding with the position encoding and the encoder in the Transformer. ESM refers to ESM2.
  • Figure 2: Various performances of GraphDTA-GIN and GraphCPI-GIN versus different features on DAVIS and Human datasets. $+ x$ means that $x$ is added to the basic featurization. All means using all features.
  • Figure 3: Overview of our proposed model combos.
  • Figure 4: Label distribution of different datasets for two tasks.
  • Figure :