Scaling Structure Aware Virtual Screening to Billions of Molecules with SPRINT

Andrew T. McNutt; Abhinav K. Adduri; Caleb N. Ellington; Monica T. Dayao; Eric P. Xing; Hosein Mohimani; David R. Koes

Scaling Structure Aware Virtual Screening to Billions of Molecules with SPRINT

Andrew T. McNutt, Abhinav K. Adduri, Caleb N. Ellington, Monica T. Dayao, Eric P. Xing, Hosein Mohimani, David R. Koes

TL;DR

The paper addresses the need for scalable, accurate, and interpretable virtual screening by moving beyond structure-based docking to a vector-based DTI framework. It introduces SPRINT, which learns drug-target co-embeddings using structure-aware protein representations and multi-head attention pooling to enable rapid, proteome-scale searches and interpretable residue-level attention. SPRINT achieves state-of-the-art results on DTI classification, virtual screening benchmarks, and competitive binding affinity predictions, while enabling pan-species querying across billions of molecules with a vector store. This work promises to accelerate in silico drug discovery and repurposing by democratizing large-scale virtual screening and providing mechanistic insights through attention analyses.

Abstract

Virtual screening of small molecules against protein targets can accelerate drug discovery and development by predicting drug-target interactions (DTIs). However, structure-based methods like molecular docking are too slow to allow for broad proteome-scale screens, limiting their application in screening for off-target effects or new molecular mechanisms. Recently, vector-based methods using protein language models (PLMs) have emerged as a complementary approach that bypasses explicit 3D structure modeling. Here, we develop SPRINT, a vector-based approach for screening entire chemical libraries against whole proteomes for DTIs and novel mechanisms of action. SPRINT improves on prior work by using a self-attention based architecture and structure-aware PLMs to learn drug-target co-embeddings for binder prediction, search, and retrieval. SPRINT achieves SOTA enrichment factors in virtual screening on LIT-PCBA, DTI classification benchmarks, and binding affinity prediction benchmarks, while providing interpretability in the form of residue-level attention maps. In addition to being both accurate and interpretable, SPRINT is ultra-fast: querying the whole human proteome against the ENAMINE Real Database (6.7B drugs) for the 100 most likely binders per protein takes 16 minutes. SPRINT promises to enable virtual screening at an unprecedented scale, opening up new opportunities for in silico drug repurposing and development. SPRINT is available on the web as ColabScreen: https://bit.ly/colab-screen

Scaling Structure Aware Virtual Screening to Billions of Molecules with SPRINT

TL;DR

Abstract

Scaling Structure Aware Virtual Screening to Billions of Molecules with SPRINT

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)