Table of Contents
Fetching ...

DrugCLIP: Contrastive Drug-Disease Interaction For Drug Repurposing

Yingzhou Lu, Yaojun Hu, Chenhao Li

TL;DR

DrugCLIP tackles drug repurposing by formulating it as a drug–disease interaction problem and applying a CLIP‑like contrastive framework to learn multimodal representations without negative labels. Drugs are encoded as graphs via an MPNN, while diseases are embedded through a GRAM‑based hierarchy over ICD‑10‑CM codes; the model computes cosine similarities between drug and disease embeddings and optimizes a cross‑entropy–style loss $\mathcal{L} = - \sum_{i,j} y_{ij} \log \sigma(s_{ij}) + (1 - y_{ij}) \log (1 - \sigma(s_{ij}))$, where $s_{ij}$ is the cosine similarity. A curated dataset of ~35K clinical trials is built from ClinicalTrials.gov, DrugBank, ZINC, and ICD‑10, enabling temporally stratified evaluation with a drug database of 2,727–3,083 candidates. Across two test periods (2018–2020 and 2021–2023), DrugCLIP outperforms baselines such as BPMF and DeepDDI, achieving up to ~92% Hit@30% and substantial relative improvements in top‑k metrics, with statistical significance (p < 0.05). The work demonstrates the power of contrastive multimodal learning for drug repurposing and provides a practical dataset and framework for future AI‑driven repurposing efforts.

Abstract

Bringing a novel drug from the original idea to market typically requires more than ten years and billions of dollars. To alleviate the heavy burden, a natural idea is to reuse the approved drug to treat new diseases. The process is also known as drug repurposing or drug repositioning. Machine learning methods exhibited huge potential in automating drug repurposing. However, it still encounter some challenges, such as lack of labels and multimodal feature representation. To address these issues, we design DrugCLIP, a cutting-edge contrastive learning method, to learn drug and disease's interaction without negative labels. Additionally, we have curated a drug repurposing dataset based on real-world clinical trial records. Thorough empirical studies are conducted to validate the effectiveness of the proposed DrugCLIP method.

DrugCLIP: Contrastive Drug-Disease Interaction For Drug Repurposing

TL;DR

DrugCLIP tackles drug repurposing by formulating it as a drug–disease interaction problem and applying a CLIP‑like contrastive framework to learn multimodal representations without negative labels. Drugs are encoded as graphs via an MPNN, while diseases are embedded through a GRAM‑based hierarchy over ICD‑10‑CM codes; the model computes cosine similarities between drug and disease embeddings and optimizes a cross‑entropy–style loss , where is the cosine similarity. A curated dataset of ~35K clinical trials is built from ClinicalTrials.gov, DrugBank, ZINC, and ICD‑10, enabling temporally stratified evaluation with a drug database of 2,727–3,083 candidates. Across two test periods (2018–2020 and 2021–2023), DrugCLIP outperforms baselines such as BPMF and DeepDDI, achieving up to ~92% Hit@30% and substantial relative improvements in top‑k metrics, with statistical significance (p < 0.05). The work demonstrates the power of contrastive multimodal learning for drug repurposing and provides a practical dataset and framework for future AI‑driven repurposing efforts.

Abstract

Bringing a novel drug from the original idea to market typically requires more than ten years and billions of dollars. To alleviate the heavy burden, a natural idea is to reuse the approved drug to treat new diseases. The process is also known as drug repurposing or drug repositioning. Machine learning methods exhibited huge potential in automating drug repurposing. However, it still encounter some challenges, such as lack of labels and multimodal feature representation. To address these issues, we design DrugCLIP, a cutting-edge contrastive learning method, to learn drug and disease's interaction without negative labels. Additionally, we have curated a drug repurposing dataset based on real-world clinical trial records. Thorough empirical studies are conducted to validate the effectiveness of the proposed DrugCLIP method.
Paper Structure (18 sections, 8 equations, 3 figures, 3 tables)

This paper contains 18 sections, 8 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An example of drug repurposing: the drug that can treat lung disease can be reused to treat colds. Drug repurposing involves using approved drug candidates (or drug candidates that pass a Phase I trial) to treat new diseases. Compared with de novo drug design that designs drug molecules from scratch, drug repurposing saves many resources and time because the safety issue of the drug has been largely alleviated.
  • Figure 2: Message passing neural network for drug molecule representation. For each node (red) in the input molecular graph, message passing neural network iteratively updates its representation by aggregating representations of its neighbors (orange).
  • Figure 3: Illustration of Graph-based attention model (GRAM), where the representation of the disease code is a weighted average of itself and all of its ancestors, and the weight is evaluated by attention mechanism.