Table of Contents
Fetching ...

Conformal Prediction for Uncertainty Estimation in Drug-Target Interaction Prediction

Morteza Rakhshaninejad, Mira Jurgens, Nicolas Dewolf, Willem Waegeman

TL;DR

This work tackles calibrated uncertainty estimation in drug–target interaction DTI prediction, where standard marginal conformal prediction may neglect heterogeneity across drugs and proteins. It introduces three cluster-conditioned conformal prediction variants CCP-NC, CCP-FC, and CCP-NN and benchmarks them against Marginal CP and Group-Conditioned CP on the KIBA dataset using a gradient-boosting regressor with multi-modal drug and protein features. Across four dataset-splitting schemes, CCP-NC consistently yields tighter prediction intervals and smaller subgroup coverage gaps, particularly in random and New Drug–Protein splits, while GCP performs best when one entity is well represented. The results demonstrate that cluster-based uncertainty quantification can substantially improve coverage and efficiency in DTI and likely generalizes to other two-entity interaction tasks, offering a data-efficient route to reliable uncertainty estimates in complex biomedical prediction problems.

Abstract

Accurate drug-target interaction (DTI) prediction with machine learning models is essential for drug discovery. Such models should also provide a credible representation of their uncertainty, but applying classical marginal conformal prediction (CP) in DTI prediction often overlooks variability across drug and protein subgroups. In this work, we analyze three cluster-conditioned CP methods for DTI prediction, and compare them with marginal and group-conditioned CP. Clusterings are obtained via nonconformity scores, feature similarity, and nearest neighbors, respectively. Experiments on the KIBA dataset using four data-splitting strategies show that nonconformity-based clustering yields the tightest intervals and most reliable subgroup coverage, especially in random and fully unseen drug-protein splits. Group-conditioned CP works well when one entity is familiar, but residual-driven clustering provides robust uncertainty estimates even in sparse or novel scenarios. These results highlight the potential of cluster-based CP for improving DTI prediction under uncertainty.

Conformal Prediction for Uncertainty Estimation in Drug-Target Interaction Prediction

TL;DR

This work tackles calibrated uncertainty estimation in drug–target interaction DTI prediction, where standard marginal conformal prediction may neglect heterogeneity across drugs and proteins. It introduces three cluster-conditioned conformal prediction variants CCP-NC, CCP-FC, and CCP-NN and benchmarks them against Marginal CP and Group-Conditioned CP on the KIBA dataset using a gradient-boosting regressor with multi-modal drug and protein features. Across four dataset-splitting schemes, CCP-NC consistently yields tighter prediction intervals and smaller subgroup coverage gaps, particularly in random and New Drug–Protein splits, while GCP performs best when one entity is well represented. The results demonstrate that cluster-based uncertainty quantification can substantially improve coverage and efficiency in DTI and likely generalizes to other two-entity interaction tasks, offering a data-efficient route to reliable uncertainty estimates in complex biomedical prediction problems.

Abstract

Accurate drug-target interaction (DTI) prediction with machine learning models is essential for drug discovery. Such models should also provide a credible representation of their uncertainty, but applying classical marginal conformal prediction (CP) in DTI prediction often overlooks variability across drug and protein subgroups. In this work, we analyze three cluster-conditioned CP methods for DTI prediction, and compare them with marginal and group-conditioned CP. Clusterings are obtained via nonconformity scores, feature similarity, and nearest neighbors, respectively. Experiments on the KIBA dataset using four data-splitting strategies show that nonconformity-based clustering yields the tightest intervals and most reliable subgroup coverage, especially in random and fully unseen drug-protein splits. Group-conditioned CP works well when one entity is familiar, but residual-driven clustering provides robust uncertainty estimates even in sparse or novel scenarios. These results highlight the potential of cluster-based CP for improving DTI prediction under uncertainty.

Paper Structure

This paper contains 15 sections, 14 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Schematic of dataset structure: each interaction is a unique drug--target pair $(d_i, t_j)$ with binding affinity $y(d_i, t_j)$.
  • Figure 2: Illustrative example of dataset preparation for model training and conformal prediction. The drug--target interaction matrix shows samples used for training, calibration, and testing.
  • Figure 3: Two-level clustering in CCP-NC. Left: Nonconformity scores grouped by drug and protein. Middle: ECDF embeddings (10th--90th percentiles) extracted. Right:$k$-means assigns cluster indices $\kappa_{\text{drug}}$, $\kappa_{\text{protein}}$ for cluster-specific quantile computation.
  • Figure 4: Two-level feature-based clustering in CCP-FC. Drugs $d_i$ and proteins $t_j$ are embedded using molecular and sequence features, respectively, and clustered via $k$-means to assign $\kappa_{\text{drug}}$ and $\kappa_{\text{protein}}$, used for grouping calibration scores and estimating cluster-specific quantiles.
  • Figure 5: Observed vs. expected coverage across four dataset splitting strategies. Colors represent the six CP methods; the black diagonal indicates ideal coverage.
  • ...and 3 more figures