Preferential Multi-Objective Bayesian Optimization for Drug Discovery
Tai Dang, Long-Hung Pham, Sang T. Truong, Ari Glenn, Wendy Nguyen, Edward A. Pham, Jeffrey S. Glenn, Sanmi Koyejo, Thang Luong
TL;DR
This work tackles the bottleneck of hit selection in virtual screening by introducing CheapVS, a chemist-guided preferential multi-objective Bayesian optimization framework that incorporates expert pairwise preferences into the search for multi-property drug candidates. It couples a lightweight diffusion-based binding-affinity measurement with a Gaussian-process-based preference model, enabling efficient exploration of large ligand libraries under a fixed computational budget. The key contributions are the learning of latent utility from expert preferences, a data-augmented diffusion docking approach (EDM-S) for scalable affinity estimation, and an end-to-end screening pipeline that significantly improves recovery of known drugs for EGFR and DRD2 compared with affinity-only baselines. The results demonstrate substantial efficiency gains in hit identification, with practical implications for accelerating drug discovery while balancing multiple pharmacokinetic and safety-related objectives.
Abstract
Despite decades of advancements in automated ligand screening, large-scale drug discovery remains resource-intensive and requires post-processing hit selection, a step where chemists manually select a few promising molecules based on their chemical intuition. This creates a major bottleneck in the virtual screening process for drug discovery, demanding experts to repeatedly balance complex trade-offs among drug properties across a vast pool of candidates. To improve the efficiency and reliability of this process, we propose a novel human-centered framework named CheapVS that allows chemists to guide the ligand selection process by providing preferences regarding the trade-offs between drug properties via pairwise comparison. Our framework combines preferential multi-objective Bayesian optimization with a docking model for measuring binding affinity to capture human chemical intuition for improving hit identification. Specifically, on a library of 100K chemical candidates targeting EGFR and DRD2, CheapVS outperforms state-of-the-art screening methods in identifying drugs within a limited computational budget. Notably, our method can recover up to 16/37 EGFR and 37/58 DRD2 known drugs while screening only 6% of the library, showcasing its potential to significantly advance drug discovery.
