MuCoS: Efficient Drug-Target Prediction through Multi-Context-Aware Sampling
Haji Gul, Abdul Gani Haji Naim, Ajaz A. Bhat
TL;DR
This work tackles the challenge of predicting drug-target interactions in biomedical knowledge graphs where unseen relations and the overhead of negative sampling hinder traditional KG embedding methods. It introduces MuCoS, a Multi-Context-Aware Sampling approach that uses density-based neighbor selection to form context-rich inputs, which are then processed by BERT to predict relations and tails with a negative-sample-free training regime. Compared to CAB-KGC and other baselines on the KEGG50k dataset, MuCoS achieves higher MRR and Hits@k for drug-target prediction while offering roughly a 10x reduction in computational cost. The approach enables efficient, scalable DTI prediction by leveraging contextualized, structurally informed representations, with potential gains in speeding up drug discovery workflows.
Abstract
Drug-target interactions are critical for understanding biological processes and advancing drug discovery. However, traditional methods such as ComplEx-SE, TransE, and DistMult struggle with unseen relationships and negative triplets, which limits their effectiveness in drug-target prediction. To address these challenges, we propose Multi-Context-Aware Sampling (MuCoS), an efficient and positively accurate method for drug-target prediction. MuCoS reduces computational complexity by prioritizing neighbors of higher density to capture informative structural patterns. These optimized neighborhood representations are integrated with BERT, enabling contextualized embeddings for accurate prediction of missing relationships or tail entities. MuCoS avoids the need for negative triplet sampling, reducing computation while improving performance over unseen entities and relations. Experiments on the KEGG50k biomedical dataset show that MuCoS improved over existing models by 13\% on MRR, 7\% on Hits@1, 4\% on Hits@3, and 18\% on Hits@10 for the general relationship, and by 6\% on MRR, 1\% on Hits@1, 3\% on Hits@3, and 12\% on Hits@10 for prediction of drug-target relationship.
