Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction
Karina Zadorozhny, Kangway V. Chuang, Bharath Sathappan, Ewan Wallace, Vishnu Sresht, Colin A. Grambow
TL;DR
SQRL reframes molecular activity prediction as learning relative differences between nearby compounds by using similarity-aware pairings. It combines a similarity-thresholded data-matching strategy with a learnable relative representation to train models on relative differences $\Delta y_{ij}$, improving generalization in low-data regimes and capturing activity cliffs. Across 30 MoleculeACE tasks and internal targets, SQRL yields consistent improvements for deep models, particularly GNNs and pretrained transformers, while relying on informative local pairs rather than indiscriminate all-pairs training. This approach provides a practical paradigm for more robust, similarity-aware drug discovery modeling.
Abstract
Accurate prediction of molecular activities is crucial for efficient drug discovery, yet remains challenging due to limited and noisy datasets. We introduce Similarity-Quantized Relative Learning (SQRL), a learning framework that reformulates molecular activity prediction as relative difference learning between structurally similar pairs of compounds. SQRL uses precomputed molecular similarities to enhance training of graph neural networks and other architectures, and significantly improves accuracy and generalization in low-data regimes common in drug discovery. We demonstrate its broad applicability and real-world potential through benchmarking on public datasets as well as proprietary industry data. Our findings demonstrate that leveraging similarity-aware relative differences provides an effective paradigm for molecular activity prediction.
