Table of Contents
Fetching ...

PharmacoNet: Accelerating Large-Scale Virtual Screening by Deep Pharmacophore Modeling

Seonghwan Seo, Woo Youn Kim

TL;DR

PharmacoNet addresses the need for scalable, structure-based pre-screening by introducing a deep-learning framework that (1) performs automated protein-based pharmacophore modeling via instance segmentation, and (2) uses a coarse-grained graph-matching approach to predict ligand binding poses at the pharmacophore level with a distance-likelihood scoring function. Ground-truth pharmacophore information is derived from complex-based data in PDBBind with PLIP-handled NCIs across seven pharmacophore types, enabling automated hotspot and pharmacophore generation. In benchmark and large-scale pre-screening experiments, PharmacoNet achieves orders-of-magnitude speedups over traditional docking methods while maintaining competitive enrichment (EF) and AUROC, and demonstrates strong generalization under reduced training data. Limitations include the absence of atomic-level energetics, suggesting future integration with force-field terms or atomistic ML into graph matching and scoring to further enhance accuracy. Overall, PharmacoNet reveals the untapped potential of deep pharmacophore modeling for fast, generalizable, structure-based drug discovery at scale.

Abstract

As the size of accessible compound libraries expands to over 10 billion, the need for more efficient structure-based virtual screening methods is emerging. Different pre-screening methods have been developed for rapid screening, but there is still a lack of structure-based methods applicable to various proteins that perform protein-ligand binding conformation prediction and scoring in an extremely short time. Here, we describe for the first time a deep-learning framework for structure-based pharmacophore modeling to address this challenge. We frame pharmacophore modeling as an instance segmentation problem to determine each protein hotspot and the location of corresponding pharmacophores, and protein-ligand binding pose prediction as a graph-matching problem. PharmacoNet is significantly faster than state-of-the-art structure-based approaches, yet reasonably accurate with a simple scoring function. Furthermore, we show the promising result that PharmacoNet effectively retains hit candidates even under the high pre-screening filtration rates. Overall, our study uncovers the hitherto untapped potential of a pharmacophore modeling approach in deep learning-based drug discovery.

PharmacoNet: Accelerating Large-Scale Virtual Screening by Deep Pharmacophore Modeling

TL;DR

PharmacoNet addresses the need for scalable, structure-based pre-screening by introducing a deep-learning framework that (1) performs automated protein-based pharmacophore modeling via instance segmentation, and (2) uses a coarse-grained graph-matching approach to predict ligand binding poses at the pharmacophore level with a distance-likelihood scoring function. Ground-truth pharmacophore information is derived from complex-based data in PDBBind with PLIP-handled NCIs across seven pharmacophore types, enabling automated hotspot and pharmacophore generation. In benchmark and large-scale pre-screening experiments, PharmacoNet achieves orders-of-magnitude speedups over traditional docking methods while maintaining competitive enrichment (EF) and AUROC, and demonstrates strong generalization under reduced training data. Limitations include the absence of atomic-level energetics, suggesting future integration with force-field terms or atomistic ML into graph matching and scoring to further enhance accuracy. Overall, PharmacoNet reveals the untapped potential of deep pharmacophore modeling for fast, generalizable, structure-based drug discovery at scale.

Abstract

As the size of accessible compound libraries expands to over 10 billion, the need for more efficient structure-based virtual screening methods is emerging. Different pre-screening methods have been developed for rapid screening, but there is still a lack of structure-based methods applicable to various proteins that perform protein-ligand binding conformation prediction and scoring in an extremely short time. Here, we describe for the first time a deep-learning framework for structure-based pharmacophore modeling to address this challenge. We frame pharmacophore modeling as an instance segmentation problem to determine each protein hotspot and the location of corresponding pharmacophores, and protein-ligand binding pose prediction as a graph-matching problem. PharmacoNet is significantly faster than state-of-the-art structure-based approaches, yet reasonably accurate with a simple scoring function. Furthermore, we show the promising result that PharmacoNet effectively retains hit candidates even under the high pre-screening filtration rates. Overall, our study uncovers the hitherto untapped potential of a pharmacophore modeling approach in deep learning-based drug discovery.
Paper Structure (58 sections, 9 equations, 5 figures, 4 tables, 2 algorithms)

This paper contains 58 sections, 9 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of PharmacoNet.
  • Figure 2: The detailed scheme of PharmacoNet. a. The architecture of deep learning model for fully automated protein-based pharmacophore modeling. For visualization, 3D feature maps are represented as 2D maps. The complex-based pharmacophore model is constructed from the crystal structure of the protein-ligand binding complex. (Section \ref{['section: pharmacophore_modeling']}) b. The graph-matching algorithm for inexact graph matching. The numbers in the figure are arbitrary values. (Section \ref{['section: graph_matching']})
  • Figure 3: Visualizations of the pharmacophore model for KRAS. The color of pharmacophore and hot spot is as follows: orange for hydrophobic carbon, purple for aromatic ring, red for anion, blue for cation, cyan for H-bond donor, magenta for H-bond acceptor, and yellow for halogen atom and halogen bond acceptor. (a) The generated pharmacophore model of the binding site (PDB ID: 6OIM). (b) The crystal structure of the known inhibitor (AMG-510). (c) The SMINA docking pose of the ligand with the highest score of pre-screening.
  • Figure 4: The sigmoid score distribution of tokens in the validation set. The blue histograms mean the score distribution of hot spots and the orange ones mean the score distribution of all tokens within cavities. The numbers in the parentheses mean the number of tokens. The red line means the threshold for hot spot prediction.
  • Figure 5: Average runtime according to the number of RDKit ETKDG conformations.