Table of Contents
Fetching ...

MuCoS: Efficient Drug Target Discovery via Multi Context Aware Sampling in Knowledge Graphs

Haji Gul, Abdul Ghani Naim, Ajaz Ahmad Bhat

TL;DR

MuCoS addresses drug target discovery by reframing KG completion as a prediction task on heterogeneous biomedical graphs and introduces density-based multi-context sampling to extract informative neighborhood patterns. By combining contextual structural information with textual semantics via a BERT-based predictor, MuCoS eliminates the need for negative sampling and improves generalization to unseen drug–target pairs. On KEGG50k, MuCoS achieves up to 13% MRR improvements for general relation prediction and 6% for drug-target relations, while delivering approximately 175x faster training than MuCo-KGC. The method also demonstrates competitive performance on standard KG benchmarks with ablation-supported insights into the importance of head-context, offering a scalable, practical tool for large-scale biomedical KG-driven drug discovery.

Abstract

Accurate prediction of drug target interactions is critical for accelerating drug discovery and elucidating complex biological mechanisms. In this work, we frame drug target prediction as a link prediction task on heterogeneous biomedical knowledge graphs (KG) that integrate drugs, proteins, diseases, pathways, and other relevant entities. Conventional KG embedding methods such as TransE and ComplEx SE are hindered by their reliance on computationally intensive negative sampling and their limited generalization to unseen drug target pairs. To address these challenges, we propose Multi Context Aware Sampling (MuCoS), a novel framework that prioritizes high-density neighbours to capture salient structural patterns and integrates these with contextual embeddings derived from BERT. By unifying structural and textual modalities and selectively sampling highly informative patterns, MuCoS circumvents the need for negative sampling, significantly reducing computational overhead while enhancing predictive accuracy for novel drug target associations and drug targets. Extensive experiments on the KEGG50k dataset demonstrate that MuCoS outperforms state-of-the-art baselines, achieving up to a 13\% improvement in mean reciprocal rank (MRR) in predicting any relation in the dataset and a 6\% improvement in dedicated drug target relation prediction.

MuCoS: Efficient Drug Target Discovery via Multi Context Aware Sampling in Knowledge Graphs

TL;DR

MuCoS addresses drug target discovery by reframing KG completion as a prediction task on heterogeneous biomedical graphs and introduces density-based multi-context sampling to extract informative neighborhood patterns. By combining contextual structural information with textual semantics via a BERT-based predictor, MuCoS eliminates the need for negative sampling and improves generalization to unseen drug–target pairs. On KEGG50k, MuCoS achieves up to 13% MRR improvements for general relation prediction and 6% for drug-target relations, while delivering approximately 175x faster training than MuCo-KGC. The method also demonstrates competitive performance on standard KG benchmarks with ablation-supported insights into the importance of head-context, offering a scalable, practical tool for large-scale biomedical KG-driven drug discovery.

Abstract

Accurate prediction of drug target interactions is critical for accelerating drug discovery and elucidating complex biological mechanisms. In this work, we frame drug target prediction as a link prediction task on heterogeneous biomedical knowledge graphs (KG) that integrate drugs, proteins, diseases, pathways, and other relevant entities. Conventional KG embedding methods such as TransE and ComplEx SE are hindered by their reliance on computationally intensive negative sampling and their limited generalization to unseen drug target pairs. To address these challenges, we propose Multi Context Aware Sampling (MuCoS), a novel framework that prioritizes high-density neighbours to capture salient structural patterns and integrates these with contextual embeddings derived from BERT. By unifying structural and textual modalities and selectively sampling highly informative patterns, MuCoS circumvents the need for negative sampling, significantly reducing computational overhead while enhancing predictive accuracy for novel drug target associations and drug targets. Extensive experiments on the KEGG50k dataset demonstrate that MuCoS outperforms state-of-the-art baselines, achieving up to a 13\% improvement in mean reciprocal rank (MRR) in predicting any relation in the dataset and a 6\% improvement in dedicated drug target relation prediction.

Paper Structure

This paper contains 10 sections, 11 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: A concise overview of the MuCoS model pipeline, which is designed to predict general and drug-target relations and tail entities. The boxes on the left show the input sequence to the BERT model, where $(h)$ head, $(\mathcal{H}_c)$ head context, $(t)$ tail, $(\mathcal{T}_c)$ tail context, $(r)$ relation, and $(\mathcal{R}_c)$ relation context. This integrated context is passed through the BERT model with a linear classifier and softmax function to generate probabilities for relations and tail.
  • Figure 2: MuCoS $\mathcal{H}_c$ construction. The left graphical view illustrates one hop head $h$ context, which consists of the set of relations $\mathcal{R}(h)$ ($r_1, r_2, r_3, r_4, r_5, r_6$) and the set of neighbouring tail entities $\mathcal{E}(h)$ ($e_1, e_2, e_3, e_4, e_5, e_6$) associated with the head entity $h$. The middle view shows the sampling process, where only the top-$n$ (suppose $n = 3$) tail entities $e$ are selected and concatenated ($\Vert$) based on their density $\rho(e)$, to calculate the optimized head context $\mathcal{H}_c$.
  • Figure 3: $\mathcal{R}_c$ construction. The left view illustrates the relationship $r_1$ and entities (head, tail) connected by $r_1$. The graph in the middle depicts optimization, selecting the top $k$ (suppose $k = 2$) entities based on density $\rho$, retaining pairs such as $(e_2, e_3)$ and $(e_6, e_7)$ The optimized context $\mathcal{R}_c$ is aggregated using concatenation ($\Vert$), as shown in the right section.