Achieving Well-Informed Decision-Making in Drug Discovery: A Comprehensive Calibration Study using Neural Network-Based Structure-Activity Models
Hannah Rosa Friesacher, Ola Engkvist, Lewis Mervin, Yves Moreau, Adam Arany
TL;DR
The paper tackles the challenge of well-calibrated uncertainty in neural models for drug-target interaction prediction. It systematically compares hyperparameter metrics and introduces Bayesian Linear Probing (BLP), a computationally efficient last-layer Bayesian approach, alongside post hoc Platt scaling and calibration-free methods. Across three ChEMBL targets, BCE loss and ACE as HP metrics consistently improve probability calibration and, in several cases, AUC as well; BLP achieves state-of-the-art calibration with reduced computational burden compared to full Bayesian treatments. The work provides practical guidance for building reliably calibrated models in drug discovery, enabling better-informed decision-making with potentially reduced experimental costs. Overall, it demonstrates that combining calibrated uncertainty with post hoc calibration can further enhance model reliability and decision quality in drug development pipelines.
Abstract
In the drug discovery process, where experiments can be costly and time-consuming, computational models that predict drug-target interactions are valuable tools to accelerate the development of new therapeutic agents. Estimating the uncertainty inherent in these neural network predictions provides valuable information that facilitates optimal decision-making when risk assessment is crucial. However, such models can be poorly calibrated, which results in unreliable uncertainty estimates that do not reflect the true predictive uncertainty. In this study, we compare different metrics, including accuracy and calibration scores, used for model hyperparameter tuning to investigate which model selection strategy achieves well-calibrated models. Furthermore, we propose to use a computationally efficient Bayesian uncertainty estimation method named Bayesian Linear Probing (BLP), which generates Hamiltonian Monte Carlo (HMC) trajectories to obtain samples for the parameters of a Bayesian Logistic Regression fitted to the hidden layer of the baseline neural network. We report that BLP improves model calibration and achieves the performance of common uncertainty quantification methods by combining the benefits of uncertainty estimation and probability calibration methods. Finally, we show that combining post hoc calibration method with well-performing uncertainty quantification approaches can boost model accuracy and calibration.
