Conformal Prediction for Uncertainty Estimation in Drug-Target Interaction Prediction
Morteza Rakhshaninejad, Mira Jurgens, Nicolas Dewolf, Willem Waegeman
TL;DR
This work tackles calibrated uncertainty estimation in drug–target interaction DTI prediction, where standard marginal conformal prediction may neglect heterogeneity across drugs and proteins. It introduces three cluster-conditioned conformal prediction variants CCP-NC, CCP-FC, and CCP-NN and benchmarks them against Marginal CP and Group-Conditioned CP on the KIBA dataset using a gradient-boosting regressor with multi-modal drug and protein features. Across four dataset-splitting schemes, CCP-NC consistently yields tighter prediction intervals and smaller subgroup coverage gaps, particularly in random and New Drug–Protein splits, while GCP performs best when one entity is well represented. The results demonstrate that cluster-based uncertainty quantification can substantially improve coverage and efficiency in DTI and likely generalizes to other two-entity interaction tasks, offering a data-efficient route to reliable uncertainty estimates in complex biomedical prediction problems.
Abstract
Accurate drug-target interaction (DTI) prediction with machine learning models is essential for drug discovery. Such models should also provide a credible representation of their uncertainty, but applying classical marginal conformal prediction (CP) in DTI prediction often overlooks variability across drug and protein subgroups. In this work, we analyze three cluster-conditioned CP methods for DTI prediction, and compare them with marginal and group-conditioned CP. Clusterings are obtained via nonconformity scores, feature similarity, and nearest neighbors, respectively. Experiments on the KIBA dataset using four data-splitting strategies show that nonconformity-based clustering yields the tightest intervals and most reliable subgroup coverage, especially in random and fully unseen drug-protein splits. Group-conditioned CP works well when one entity is familiar, but residual-driven clustering provides robust uncertainty estimates even in sparse or novel scenarios. These results highlight the potential of cluster-based CP for improving DTI prediction under uncertainty.
