Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

Qizhi Pei; Lijun Wu; Zhenyu He; Jinhua Zhu; Yingce Xia; Shufang Xie; Rui Yan

Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

Qizhi Pei, Lijun Wu, Zhenyu He, Jinhua Zhu, Yingce Xia, Shufang Xie, Rui Yan

TL;DR

This work addresses the challenge of improving drug-target binding affinity (DTA) prediction with suboptimal accuracy and high training costs by introducing a non-parametric, embedding-based retrieval framework, $k$NN-DTA, applied on a pre-trained DTA model. It combines two neighbor-aggregation schemes—label aggregation via pairwise retrieval and representation aggregation via pointwise retrieval—into a unified inference-time pipeline, with an optional adaptive extension, Ada-$k$NN-DTA, that learns aggregation weights with lightweight training. Across four benchmark datasets (BindingDB IC$_{50}$, BindingDB $K_i$, DAVIS, KIBA), $k$NN-DTA achieves new state-of-the-art RMSE scores (e.g., 0.684 for IC$_{50}$ and 0.750 for $K_i$), and Ada-$k$NN-DTA further improves these results, while also showing promising zero-shot transfer performance. The approach demonstrates that smart retrieval from a pre-trained model can substantially boost predictive power without retraining, offering practical benefits for virtual screening and drug repurposing in AI-for-science contexts.

Abstract

Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose $k$NN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power of the DTA model with no or negligible cost. Different from existing methods, we introduce two neighbor aggregation ways from both embedding space and label space that are integrated into a unified framework. Specifically, we propose a \emph{label aggregation} with \emph{pair-wise retrieval} and a \emph{representation aggregation} with \emph{point-wise retrieval} of the nearest neighbors. This method executes in the inference phase and can efficiently boost the DTA prediction performance with no training cost. In addition, we propose an extension, Ada-$k$NN-DTA, an instance-wise and adaptive aggregation with lightweight learning. Results on four benchmark datasets show that $k$NN-DTA brings significant improvements, outperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB IC$_{50}$ and $K_i$ testbeds, $k$NN-DTA obtains new records of RMSE $\bf{0.684}$ and $\bf{0.750}$. The extended Ada-$k$NN-DTA further improves the performance to be $\bf{0.675}$ and $\bf{0.735}$ RMSE. These results strongly prove the effectiveness of our method. Results in other settings and comprehensive studies/analyses also show the great potential of our $k$NN-DTA approach.

Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

TL;DR

NN-DTA, applied on a pre-trained DTA model. It combines two neighbor-aggregation schemes—label aggregation via pairwise retrieval and representation aggregation via pointwise retrieval—into a unified inference-time pipeline, with an optional adaptive extension, Ada-

NN-DTA, that learns aggregation weights with lightweight training. Across four benchmark datasets (BindingDB IC

, BindingDB

, DAVIS, KIBA),

NN-DTA achieves new state-of-the-art RMSE scores (e.g., 0.684 for IC

and 0.750 for

), and Ada-

NN-DTA further improves these results, while also showing promising zero-shot transfer performance. The approach demonstrates that smart retrieval from a pre-trained model can substantially boost predictive power without retraining, offering practical benefits for virtual screening and drug repurposing in AI-for-science contexts.

Abstract

NN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power of the DTA model with no or negligible cost. Different from existing methods, we introduce two neighbor aggregation ways from both embedding space and label space that are integrated into a unified framework. Specifically, we propose a \emph{label aggregation} with \emph{pair-wise retrieval} and a \emph{representation aggregation} with \emph{point-wise retrieval} of the nearest neighbors. This method executes in the inference phase and can efficiently boost the DTA prediction performance with no training cost. In addition, we propose an extension, Ada-

NN-DTA, an instance-wise and adaptive aggregation with lightweight learning. Results on four benchmark datasets show that

NN-DTA brings significant improvements, outperforming previous state-of-the-art (SOTA) results, e.g, on BindingDB IC

and

testbeds,

NN-DTA obtains new records of RMSE

and

. The extended Ada-

NN-DTA further improves the performance to be

and

RMSE. These results strongly prove the effectiveness of our method. Results in other settings and comprehensive studies/analyses also show the great potential of our

NN-DTA approach.

Paper Structure (31 sections, 3 equations, 6 figures, 11 tables)

This paper contains 31 sections, 3 equations, 6 figures, 11 tables.

Introduction
Related Work
Method
Retrieval-based $k$NN-DTA
Label Aggregation with Pair-wise Retrieval
Datastore
Pair-wise Retrieval, Label Aggregation, Affinity Prediction
Representation Aggregation with Point-wise Retrieval
Datastore, Point-wise Retrieval, Representation Aggregation, Affinity Prediction
Unified Framework
Extension: Adaptive Retrieval-based Ada-$k$NN-DTA
Discussion
Experiments
Datasets and Pre-trained DTA Models
Parameters of $k$NN-DTA and Evaluation Metrics
...and 16 more sections

Figures (6)

Figure 1: The overall framework of our $k$NN-DTA and Ada-$k$NN-DTA. We use two Transformer encoders $\mathcal{M}_D$ and $\mathcal{M}_T$ to encode drug $D$ and target $T$. The representations $R_D$ and $R_T$ are separately used for representation aggregation with point-wise retrieval. Meanwhile, the concatenation of $R_{D}$ and $R_{T}$ are then used for label aggregation with pair-wise retrieval. The dashed grey 'Ada' parts are the lightweight learning modules in Ada-$k$NN-DTA. '$\mathcal{P}$' stands for the prediction module, $\mathcal{(K, V)}$, $\mathcal{K}_D$, $\mathcal{K}_T$ are the datastores, and $\mathcal{N}$, $\mathcal{N}_D$, $\mathcal{N}_T$ are retrieved nearest neighbors. The aggregated representation and the affinity are in red outline.
Figure 2: Case 1. Among these 32 neighbors, the target is the same for all neighbors.
Figure 3: Embedding visualization for all the drugs that can bind to target (UniProt ID: P29274). The query drug (CID: 11791862) is in red, and the nearest $8$ drugs are in blue. The number of each node is the ground-truth affinity score.
Figure 4: The architecture of our DTA prediction model, which contains one drug encoder and one target encoder ($\mathcal{M}_D$ and $\mathcal{M}_T$), and one upper prediction module ($\mathcal{P}$). Note that the first $12$ layers of $N=16$ layers encoder are pre-trained on unlabeled molecules and proteins and then fixed. Only the last $4$ layers are finetuned for DTA prediction.
Figure 5: The label distribution of BindingDB IC$_{50}$ and $K_i$ datasets. The x axis is the affinity value (processed log version), and the y axis is the frequency ratio of the affinity value.
...and 1 more figures

Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

TL;DR

Abstract

Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

Authors

TL;DR

Abstract

Table of Contents

Figures (6)