Table of Contents
Fetching ...

LaPro-DTA: Latent Dual-View Drug Representations and Salient Protein Feature Extraction for Generalizable Drug--Target Affinity Prediction

Zihan Dun, Liuyi Xu, An-Yang Lu, Shuang Li, Yining Qian

Abstract

Drug--target affinity prediction is pivotal for accelerating drug discovery, yet existing methods suffer from significant performance degradation in realistic cold-start scenarios (unseen drugs/targets/pairs), primarily driven by overfitting to training instances and information loss from irrelevant target sequences. In this paper, we propose LaPro-DTA, a framework designed to achieve robust and generalizable DTA prediction. To tackle overfitting, we devise a latent dual-view drug representation mechanism. It synergizes an instance-level view to capture fine-grained substructures with stochastic perturbation and a distribution-level view to distill generalized chemical scaffolds via semantic remapping, thereby enforcing the model to learn transferable structural rules rather than memorizing specific samples. To mitigate information loss, we introduce a salient protein feature extraction strategy using pattern-aware top-$k$ pooling, which effectively filters background noise and isolates high-response bioactive regions. Furthermore, a cross-view multi-head attention mechanism fuses these purified features to model comprehensive interactions. Extensive experiments on benchmark datasets demonstrate that LaPro-DTA significantly outperforms state-of-the-art methods, achieving an 8\% MSE reduction on the Davis dataset in the challenging unseen-drug setting, while offering interpretable insights into binding mechanisms.

LaPro-DTA: Latent Dual-View Drug Representations and Salient Protein Feature Extraction for Generalizable Drug--Target Affinity Prediction

Abstract

Drug--target affinity prediction is pivotal for accelerating drug discovery, yet existing methods suffer from significant performance degradation in realistic cold-start scenarios (unseen drugs/targets/pairs), primarily driven by overfitting to training instances and information loss from irrelevant target sequences. In this paper, we propose LaPro-DTA, a framework designed to achieve robust and generalizable DTA prediction. To tackle overfitting, we devise a latent dual-view drug representation mechanism. It synergizes an instance-level view to capture fine-grained substructures with stochastic perturbation and a distribution-level view to distill generalized chemical scaffolds via semantic remapping, thereby enforcing the model to learn transferable structural rules rather than memorizing specific samples. To mitigate information loss, we introduce a salient protein feature extraction strategy using pattern-aware top- pooling, which effectively filters background noise and isolates high-response bioactive regions. Furthermore, a cross-view multi-head attention mechanism fuses these purified features to model comprehensive interactions. Extensive experiments on benchmark datasets demonstrate that LaPro-DTA significantly outperforms state-of-the-art methods, achieving an 8\% MSE reduction on the Davis dataset in the challenging unseen-drug setting, while offering interpretable insights into binding mechanisms.
Paper Structure (26 sections, 10 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 26 sections, 10 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the LaPro-DTA architecture. (a) Drug Representation: Extracts drug features using a dual-view mechanism to capture both instance-level and distribution-level view. (b) Target Representation: Extracts target features by selectively focusing on key bioactive segments rather than the whole sequence. (c) Feature Fusion and Prediction: Aligns drug and target features to capture fine-grained interactions between them. The red line denotes the DeCNN-to-shared-encoder route that generates the distribution-level view.
  • Figure 2: Ablation study on the Davis dataset.
  • Figure 3: Parameter sensitivity analysis on the Davis dataset.
  • Figure 4: Saliency visualization of molecules with and without distribution-level view integration, and t-SNE projection of learned drug representations. The distinct clustering of drugs sharing similar scaffolds demonstrates that the model effectively captures discriminative structural semantics, mapping chemically similar compounds into proximal latent regions.
  • Figure 5: A saliency visualization of the drug--target binding pocket, where colored regions indicate key residues identified by the model.
  • ...and 2 more figures