A Morse Transform for Drug Discovery
Alexander M. Tanaka, Aras T. Asaad, Richard Cooper, Vidit Nanda
TL;DR
This work addresses ligand-based virtual screening under data scarcity by introducing a topology-driven descriptor based on piecewise-linear Morse theory. Ligands are modeled as pruned Delaunay complexes and analyzed across many directions to produce a 72-dimensional Morse feature vector that captures boundary topology via the Morse data of critical points; a lightweight classifier (LightGBM) is then used for binary active/decoy ranking. Chemistry-aware extensions further boost performance, achieving state-of-the-art AUROC on DUD-E (up to $0.97\pm0.03$) and strong results on MUV (up to $0.74\pm0.12$), while maintaining interpretability and scalability. The approach shows robustness to sampling depth and directional resolution, and demonstrates that explicit geometric-topological descriptors can rival or surpass deep learning methods in LBVS with far fewer training examples.
Abstract
We introduce a new ligand-based virtual screening (LBVS) framework that uses piecewise linear (PL) Morse theory to predict ligand binding potential. We model ligands as simplicial complexes via a pruned Delaunay triangulation, and catalogue the critical points across multiple directional height functions. This produces a rich feature vector, consisting of crucial topological features -- peaks, troughs, and saddles -- that characterise ligand surfaces relevant to binding interactions. Unlike contemporary LBVS methods that rely on computationally-intensive deep neural networks, we require only a lightweight classifier. The Morse theoretic approach achieves state-of-the-art performance on standard datasets while offering an interpretable feature vector and scalable method for ligand prioritization in early-stage drug discovery.
