Table of Contents
Fetching ...

LLM$^3$-DTI: A Large Language Model and Multi-modal data co-powered framework for Drug-Target Interaction prediction

Yuhao Zhang, Qinghong Guo, Qixian Chen, Liuwei Zhang, Hongyan Cui, Xiyi Chen

TL;DR

DTI prediction benefits from incorporating rich textual descriptions of drugs and targets alongside structural topology. The authors propose $LLM^3$-DTI, a framework that uses a domain-specific LLM to encode textual data and a dual cross-attention mechanism with TSFusion to align and fuse multi-modal embeddings for improved predictions. Experiments on a dataset with 708 drugs and 1,493 targets show state-of-the-art performance across ACC, AUROC, AUPR, MCC, and F1, with ablation studies confirming the value of each component and analyses on imbalanced data and cold-start scenarios. The work advances DTI prediction and enables more reliable drug repurposing, with code and data publicly available.

Abstract

Drug-target interaction (DTI) prediction is of great significance for drug discovery and drug repurposing. With the accumulation of a large volume of valuable data, data-driven methods have been increasingly harnessed to predict DTIs, reducing costs across various dimensions. Therefore, this paper proposes a $\textbf{L}$arge $\textbf{L}$anguage $\textbf{M}$odel and $\textbf{M}$ulti-$\textbf{M}$odel data co-powered $\textbf{D}$rug $\textbf{T}$arget $\textbf{I}$nteraction prediction framework, named LLM$^3$-DTI. LLM$^3$-DTI constructs multi-modal data embedding to enhance DTI prediction performance. In this framework, the text semantic embeddings of drugs and targets are encoded by a domain-specific LLM. To effectively align and fuse multi-modal embedding. We propose the dual cross-attention mechanism and the TSFusion module. Finally, these multi-modal data are utilized for the DTI task through an output network. The experimental results indicate that LLM$^3$-DTI can proficiently identify validated DTIs, surpassing the performance of the models employed for comparison across diverse scenarios. Consequently, LLM$^3$-DTI is adept at fulfilling the task of DTI prediction with excellence. The data and code are available at https://github.com/chaser-gua/LLM3DTI.

LLM$^3$-DTI: A Large Language Model and Multi-modal data co-powered framework for Drug-Target Interaction prediction

TL;DR

DTI prediction benefits from incorporating rich textual descriptions of drugs and targets alongside structural topology. The authors propose -DTI, a framework that uses a domain-specific LLM to encode textual data and a dual cross-attention mechanism with TSFusion to align and fuse multi-modal embeddings for improved predictions. Experiments on a dataset with 708 drugs and 1,493 targets show state-of-the-art performance across ACC, AUROC, AUPR, MCC, and F1, with ablation studies confirming the value of each component and analyses on imbalanced data and cold-start scenarios. The work advances DTI prediction and enables more reliable drug repurposing, with code and data publicly available.

Abstract

Drug-target interaction (DTI) prediction is of great significance for drug discovery and drug repurposing. With the accumulation of a large volume of valuable data, data-driven methods have been increasingly harnessed to predict DTIs, reducing costs across various dimensions. Therefore, this paper proposes a arge anguage odel and ulti-odel data co-powered rug arget nteraction prediction framework, named LLM-DTI. LLM-DTI constructs multi-modal data embedding to enhance DTI prediction performance. In this framework, the text semantic embeddings of drugs and targets are encoded by a domain-specific LLM. To effectively align and fuse multi-modal embedding. We propose the dual cross-attention mechanism and the TSFusion module. Finally, these multi-modal data are utilized for the DTI task through an output network. The experimental results indicate that LLM-DTI can proficiently identify validated DTIs, surpassing the performance of the models employed for comparison across diverse scenarios. Consequently, LLM-DTI is adept at fulfilling the task of DTI prediction with excellence. The data and code are available at https://github.com/chaser-gua/LLM3DTI.

Paper Structure

This paper contains 26 sections, 12 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The overall framework we proposed.
  • Figure 2: Ablation study results.
  • Figure 3: Parameter sensitivity analysis.
  • Figure 4: Imbalanced data training performance.
  • Figure 5: Cold start scenario performance.
  • ...and 2 more figures