Table of Contents
Fetching ...

S$^2$Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening

Bowei He, Bowen Gao, Yankai Chen, Yanyan Lan, Chen Ma, Philip S. Yu, Ya-Qin Zhang, Wei-Ying Ma

TL;DR

S$^2$Drug addresses the gap in virtual screening by explicitly incorporating protein sequence information alongside 3D pocket structure. It uses a two-stage approach: stage one pretrains sequence representations on ChemBL with bilateral data sampling to reduce noise, and stage two finetunes with a residue-level sequence–structure fusion module plus an auxiliary binding-site prediction task on PDBBind. The method achieves state-of-the-art virtual screening performance on DUD-E and LIT-PCBA and demonstrates accurate binding-site localization, validating that bridging sequence and structure improves both tasks. This work suggests a practical path toward more generalizable, efficient, and interpretable sequence–structure-aware VS models, with potential extensions to surface and solvent effects.

Abstract

Virtual screening (VS) is an essential task in drug discovery, focusing on the identification of small-molecule ligands that bind to specific protein pockets. Existing deep learning methods, from early regression models to recent contrastive learning approaches, primarily rely on structural data while overlooking protein sequences, which are more accessible and can enhance generalizability. However, directly integrating protein sequences poses challenges due to the redundancy and noise in large-scale protein-ligand datasets. To address these limitations, we propose \textbf{S$^2$Drug}, a two-stage framework that explicitly incorporates protein \textbf{S}equence information and 3D \textbf{S}tructure context in protein-ligand contrastive representation learning. In the first stage, we perform protein sequence pretraining on ChemBL using an ESM2-based backbone, combined with a tailored data sampling strategy to reduce redundancy and noise on both protein and ligand sides. In the second stage, we fine-tune on PDBBind by fusing sequence and structure information through a residue-level gating module, while introducing an auxiliary binding site prediction task. This auxiliary task guides the model to accurately localize binding residues within the protein sequence and capture their 3D spatial arrangement, thereby refining protein-ligand matching. Across multiple benchmarks, S$^2$Drug consistently improves virtual screening performance and achieves strong results on binding site prediction, demonstrating the value of bridging sequence and structure in contrastive learning.

S$^2$Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening

TL;DR

SDrug addresses the gap in virtual screening by explicitly incorporating protein sequence information alongside 3D pocket structure. It uses a two-stage approach: stage one pretrains sequence representations on ChemBL with bilateral data sampling to reduce noise, and stage two finetunes with a residue-level sequence–structure fusion module plus an auxiliary binding-site prediction task on PDBBind. The method achieves state-of-the-art virtual screening performance on DUD-E and LIT-PCBA and demonstrates accurate binding-site localization, validating that bridging sequence and structure improves both tasks. This work suggests a practical path toward more generalizable, efficient, and interpretable sequence–structure-aware VS models, with potential extensions to surface and solvent effects.

Abstract

Virtual screening (VS) is an essential task in drug discovery, focusing on the identification of small-molecule ligands that bind to specific protein pockets. Existing deep learning methods, from early regression models to recent contrastive learning approaches, primarily rely on structural data while overlooking protein sequences, which are more accessible and can enhance generalizability. However, directly integrating protein sequences poses challenges due to the redundancy and noise in large-scale protein-ligand datasets. To address these limitations, we propose \textbf{SDrug}, a two-stage framework that explicitly incorporates protein \textbf{S}equence information and 3D \textbf{S}tructure context in protein-ligand contrastive representation learning. In the first stage, we perform protein sequence pretraining on ChemBL using an ESM2-based backbone, combined with a tailored data sampling strategy to reduce redundancy and noise on both protein and ligand sides. In the second stage, we fine-tune on PDBBind by fusing sequence and structure information through a residue-level gating module, while introducing an auxiliary binding site prediction task. This auxiliary task guides the model to accurately localize binding residues within the protein sequence and capture their 3D spatial arrangement, thereby refining protein-ligand matching. Across multiple benchmarks, SDrug consistently improves virtual screening performance and achieves strong results on binding site prediction, demonstrating the value of bridging sequence and structure in contrastive learning.

Paper Structure

This paper contains 54 sections, 15 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The illustration of our proposed two-stage contrastive representation learning framework S$^2$Drug for bridging protein sequence and 3D structure. The red characters indicate pocket residues, which are however agnostic to the sequence encoder.
  • Figure 2: Virtual screening experiments on homology exclusion scenarios to evaluate method generalizability.
  • Figure 3: The hyperparameter robustness analysis for balancing coefficient $\lambda$ and temperature parameter $\tau$.