Table of Contents
Fetching ...

AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation

Wenyu Zhu, Jianhui Wang, Bowen Gao, Yinjun Jia, Haichuan Tan, Ya-Qin Zhang, Wei-Ying Ma, Yanyan Lan

TL;DR

This work tackles virtual screening when binding pockets are uncertain or missing in apo and AlphaFold-predicted structures. It introduces AANet, an alignment-and-aggregation framework with tri-modal contrastive learning and a cross-attention pocket adapter to robustly identify binding sites and score ligands without precise pocket annotations. Key contributions include formalizing SBVS under structural uncertainty, cavity-based alignment with hard negatives, and dynamic aggregation across candidate pockets, enabling pocket-agnostic training. On curated apo/AF2 benchmarks, AANet achieves near-holo performance, demonstrating practical potential to extend structure-based drug discovery to targets lacking experimentally resolved complexes.

Abstract

Virtual screening (VS) is a critical component of modern drug discovery, yet most existing methods--whether physics-based or deep learning-based--are developed around holo protein structures with known ligand-bound pockets. Consequently, their performance degrades significantly on apo or predicted structures such as those from AlphaFold2, which are more representative of real-world early-stage drug discovery, where pocket information is often missing. In this paper, we introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty. Our method comprises two core components: (1) a tri-modal contrastive learning module that aligns representations of the ligand, the holo pocket, and cavities detected from structures, thereby enhancing robustness to pocket localization error; and (2) a cross-attention based adapter for dynamically aggregating candidate binding sites, enabling the model to learn from activity data even without precise pocket annotations. We evaluated our method on a newly curated benchmark of apo structures, where it significantly outperforms state-of-the-art methods in blind apo setting, improving the early enrichment factor (EF1%) from 11.75 to 37.19. Notably, it also maintains strong performance on holo structures. These results demonstrate the promise of our approach in advancing first-in-class drug discovery, particularly in scenarios lacking experimentally resolved protein-ligand complexes. Our implementation is publicly available at https://github.com/Wiley-Z/AANet.

AANet: Virtual Screening under Structural Uncertainty via Alignment and Aggregation

TL;DR

This work tackles virtual screening when binding pockets are uncertain or missing in apo and AlphaFold-predicted structures. It introduces AANet, an alignment-and-aggregation framework with tri-modal contrastive learning and a cross-attention pocket adapter to robustly identify binding sites and score ligands without precise pocket annotations. Key contributions include formalizing SBVS under structural uncertainty, cavity-based alignment with hard negatives, and dynamic aggregation across candidate pockets, enabling pocket-agnostic training. On curated apo/AF2 benchmarks, AANet achieves near-holo performance, demonstrating practical potential to extend structure-based drug discovery to targets lacking experimentally resolved complexes.

Abstract

Virtual screening (VS) is a critical component of modern drug discovery, yet most existing methods--whether physics-based or deep learning-based--are developed around holo protein structures with known ligand-bound pockets. Consequently, their performance degrades significantly on apo or predicted structures such as those from AlphaFold2, which are more representative of real-world early-stage drug discovery, where pocket information is often missing. In this paper, we introduce an alignment-and-aggregation framework to enable accurate virtual screening under structural uncertainty. Our method comprises two core components: (1) a tri-modal contrastive learning module that aligns representations of the ligand, the holo pocket, and cavities detected from structures, thereby enhancing robustness to pocket localization error; and (2) a cross-attention based adapter for dynamically aggregating candidate binding sites, enabling the model to learn from activity data even without precise pocket annotations. We evaluated our method on a newly curated benchmark of apo structures, where it significantly outperforms state-of-the-art methods in blind apo setting, improving the early enrichment factor (EF1%) from 11.75 to 37.19. Notably, it also maintains strong performance on holo structures. These results demonstrate the promise of our approach in advancing first-in-class drug discovery, particularly in scenarios lacking experimentally resolved protein-ligand complexes. Our implementation is publicly available at https://github.com/Wiley-Z/AANet.

Paper Structure

This paper contains 40 sections, 14 equations, 7 figures, 15 tables, 3 algorithms.

Figures (7)

  • Figure 1: Performance comparison under holo and apo settings. The bar for docking in the apo (blind) setting is absent due to the high computational cost.
  • Figure 2: Cavity-based pocket augmentation with hard negative mining. For each protein–ligand complex, the holo pocket is defined by the ligand, and a pocket detection tool scans cavities on the protein structure. Cavities are labeled positive or negative based on their IoU with the holo pocket.
  • Figure 3: Model framework. AANet operates in two phases: alignment and aggregation. During alignment, representations of the ligand, holo pocket, and cavity—encoded separately—are aligned via contrastive losses. In the aggregation phase, the encoders are frozen, and a cross-attention module aggregates representations from candidate cavities (via the cavity encoder) using the ligand embedding as the query. This phase is trained on AlphaFold2-predicted structures without pocket annotations. The ligand embedding is further projected through a trainable linear layer, and a final contrastive loss aligns the adapted ligand and aggregated cavity representations.
  • Figure 4: t-SNE comparison of pocket–ligand embeddings from three structures. (a) DrugCLIP: embeddings for holo and apo-exp/pred (annonated) pockets are widely separated. (b) AANet: embeddings for each target cluster closely.
  • Figure S1: Correlation between BEDROC ($\alpha=80.5$) and the IoU / coverage of the closest detected pocket to the holo pocket on each target. Left: contrastive pocket–molecule learning; Right: Tri-modal alignment (ours).
  • ...and 2 more figures