PharmacoMatch: Efficient 3D Pharmacophore Screening via Neural Subgraph Matching
Daniel Rose, Oliver Wieder, Thomas Seidel, Thierry Langer
TL;DR
PharmacoMatch tackles the scalability bottleneck of 3D pharmacophore screening in huge chemical spaces by reframing pharmacophore matching as approximate neural subgraph matching learned through self-supervised contrastive learning. A graph neural network encoder maps pharmacophore graphs to an order-embedding space, trained with on-the-fly augmented positive/negative pairs via a max-margin loss to capture query–target relationships. The approach yields substantial runtime speedups (embedding once, fast vector comparisons) while maintaining competitive screening performance against traditional alignment methods on benchmark datasets, enabling practical pre-screening for billion-scale libraries. This work demonstrates the feasibility of vector-database–backed virtual screening and highlights avenues for further improvements in geometry precision and stereochemical discrimination.
Abstract
The increasing size of screening libraries poses a significant challenge for the development of virtual screening methods for drug discovery, necessitating a re-evaluation of traditional approaches in the era of big data. Although 3D pharmacophore screening remains a prevalent technique, its application to very large datasets is limited by the computational cost associated with matching query pharmacophores to database molecules. In this study, we introduce PharmacoMatch, a novel contrastive learning approach based on neural subgraph matching. Our method reinterprets pharmacophore screening as an approximate subgraph matching problem and enables efficient querying of conformational databases by encoding query-target relationships in the embedding space. We conduct comprehensive investigations of the learned representations and evaluate PharmacoMatch as pre-screening tool in a zero-shot setting. We demonstrate significantly shorter runtimes and comparable performance metrics to existing solutions, providing a promising speed-up for screening very large datasets.
