Table of Contents
Fetching ...

DMAGT: Unveiling miRNA-Drug Associations by Integrating SMILES and RNA Sequence Structures through Graph Transformer Models

Ziqi Zhang

TL;DR

DMAGT addresses the challenge of predicting drug–miRNA associations by integrating chemical sequence information with graph-structured relationships through a multi-layer transformer-based graph neural network. It embeds SMILES and miRNA nucleotide sequences with Word2Vec, applies Laplacian positional encodings and graph attention, and predicts edges via a concatenated-node feature MLP. Evaluated on ncDR, RNAInter, and SM2miR, the method achieves up to 0.9524 AUC and demonstrates 14/20 validated associations in case studies for 5-Fluorouracil and Oxaliplatin, outperforming eight baselines. The approach offers a scalable, data-driven shortcut for miRNA-targeted drug discovery, with future work expanding heterogeneous drug/miRNA representations.

Abstract

MiRNAs, due to their role in gene regulation, have paved a new pathway for pharmacology, focusing on drug development that targets miRNAs. However, traditional wet lab experiments are limited by efficiency and cost constraints, making it difficult to extensively explore potential associations between developed drugs and target miRNAs. Therefore, we have designed a novel machine learning model based on a multi-layer transformer-based graph neural network, DMAGT, specifically for predicting associations between drugs and miRNAs. This model transforms drug-miRNA associations into graphs, employs Word2Vec for embedding features of drug molecular structures and miRNA base structures, and leverages a graph transformer model to learn from embedded features and relational structures, ultimately predicting associations between drugs and miRNAs. To evaluate DMAGT, we tested its performance on three datasets composed of drug-miRNA associations: ncDR, RNAInter, and SM2miR, achieving up to AUC of $95.24\pm0.05$. DMAGT demonstrated superior performance in comparative experiments tackling similar challenges. To validate its practical efficacy, we specifically focused on two drugs, namely 5-Fluorouracil and Oxaliplatin. Of the 20 potential drug-miRNA associations identified as the most likely, 14 were successfully validated. The above experiments demonstrate that DMAGT has an excellent performance and stability in predicting drug-miRNA associations, providing a new shortcut for miRNA drug development.

DMAGT: Unveiling miRNA-Drug Associations by Integrating SMILES and RNA Sequence Structures through Graph Transformer Models

TL;DR

DMAGT addresses the challenge of predicting drug–miRNA associations by integrating chemical sequence information with graph-structured relationships through a multi-layer transformer-based graph neural network. It embeds SMILES and miRNA nucleotide sequences with Word2Vec, applies Laplacian positional encodings and graph attention, and predicts edges via a concatenated-node feature MLP. Evaluated on ncDR, RNAInter, and SM2miR, the method achieves up to 0.9524 AUC and demonstrates 14/20 validated associations in case studies for 5-Fluorouracil and Oxaliplatin, outperforming eight baselines. The approach offers a scalable, data-driven shortcut for miRNA-targeted drug discovery, with future work expanding heterogeneous drug/miRNA representations.

Abstract

MiRNAs, due to their role in gene regulation, have paved a new pathway for pharmacology, focusing on drug development that targets miRNAs. However, traditional wet lab experiments are limited by efficiency and cost constraints, making it difficult to extensively explore potential associations between developed drugs and target miRNAs. Therefore, we have designed a novel machine learning model based on a multi-layer transformer-based graph neural network, DMAGT, specifically for predicting associations between drugs and miRNAs. This model transforms drug-miRNA associations into graphs, employs Word2Vec for embedding features of drug molecular structures and miRNA base structures, and leverages a graph transformer model to learn from embedded features and relational structures, ultimately predicting associations between drugs and miRNAs. To evaluate DMAGT, we tested its performance on three datasets composed of drug-miRNA associations: ncDR, RNAInter, and SM2miR, achieving up to AUC of . DMAGT demonstrated superior performance in comparative experiments tackling similar challenges. To validate its practical efficacy, we specifically focused on two drugs, namely 5-Fluorouracil and Oxaliplatin. Of the 20 potential drug-miRNA associations identified as the most likely, 14 were successfully validated. The above experiments demonstrate that DMAGT has an excellent performance and stability in predicting drug-miRNA associations, providing a new shortcut for miRNA drug development.

Paper Structure

This paper contains 15 sections, 12 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of DMAGT. a) Embedding chemical information of miRNAs and drugs into low-dimensional vectors with Word2Vec. b) Reducing the number of dimensions of the miRNAs and drugs vectors to 1-dimensional vector through a convolutional layer c) Constructing an adjacent matrix according to the known associations between miRNAs and drugs. d) Learning the chemical information and associations with a multi-layer transformer-based graph neural network to compute node scores for drugs and miRNAs. e) Predict the unknown associations with node scores.
  • Figure 2: (a) The ROC curves of the result on the ncDR datasets. (b) The PR curves of the result on the ncDR datasets.
  • Figure 3: (a) The ROC curves of the result on the RNAInter datasets. (b) The PR curves of the result on the RNAInter datasets.
  • Figure 4: (a) The ROC curves of the result on the SM2miR datasets. (b) The PR curves of the result on the SM2miR datasets.