Table of Contents
Fetching ...

Devil in the Tail: A Multi-Modal Framework for Drug-Drug Interaction Prediction in Long Tail Distinction

Liangwei Nathan Zheng, Chang George Dong, Wei Emma Zhang, Xin Chen, Lin Yue, Weitong Chen

TL;DR

A novel multi-modal deep learning-based framework, namely TFDM, is introduced to leverage multiple properties of a drug to achieve DDI classification, and demonstrates that the TFMD outperforms the most recent SOTA methods in long-tailed DDI classification tasks.

Abstract

Drug-drug interaction (DDI) identification is a crucial aspect of pharmacology research. There are many DDI types (hundreds), and they are not evenly distributed with equal chance to occur. Some of the rarely occurred DDI types are often high risk and could be life-critical if overlooked, exemplifying the long-tailed distribution problem. Existing models falter against this distribution challenge and overlook the multi-faceted nature of drugs in DDI prediction. In this paper, a novel multi-modal deep learning-based framework, namely TFDM, is introduced to leverage multiple properties of a drug to achieve DDI classification. The proposed framework fuses multimodal features of drugs, including graph-based, molecular structure, Target and Enzyme, for DDI identification. To tackle the challenge posed by the distribution skewness across categories, a novel loss function called Tailed Focal Loss is introduced, aimed at further enhancing the model performance and address gradient vanishing problem of focal loss in extremely long-tailed dataset. Intensive experiments over 4 challenging long-tailed dataset demonstrate that the TFMD outperforms the most recent SOTA methods in long-tailed DDI classification tasks. The source code is released to reproduce our experiment results: https://github.com/IcurasLW/TFMD_Longtailed_DDI.git

Devil in the Tail: A Multi-Modal Framework for Drug-Drug Interaction Prediction in Long Tail Distinction

TL;DR

A novel multi-modal deep learning-based framework, namely TFDM, is introduced to leverage multiple properties of a drug to achieve DDI classification, and demonstrates that the TFMD outperforms the most recent SOTA methods in long-tailed DDI classification tasks.

Abstract

Drug-drug interaction (DDI) identification is a crucial aspect of pharmacology research. There are many DDI types (hundreds), and they are not evenly distributed with equal chance to occur. Some of the rarely occurred DDI types are often high risk and could be life-critical if overlooked, exemplifying the long-tailed distribution problem. Existing models falter against this distribution challenge and overlook the multi-faceted nature of drugs in DDI prediction. In this paper, a novel multi-modal deep learning-based framework, namely TFDM, is introduced to leverage multiple properties of a drug to achieve DDI classification. The proposed framework fuses multimodal features of drugs, including graph-based, molecular structure, Target and Enzyme, for DDI identification. To tackle the challenge posed by the distribution skewness across categories, a novel loss function called Tailed Focal Loss is introduced, aimed at further enhancing the model performance and address gradient vanishing problem of focal loss in extremely long-tailed dataset. Intensive experiments over 4 challenging long-tailed dataset demonstrate that the TFMD outperforms the most recent SOTA methods in long-tailed DDI classification tasks. The source code is released to reproduce our experiment results: https://github.com/IcurasLW/TFMD_Longtailed_DDI.git

Paper Structure

This paper contains 22 sections, 11 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: Naturally Long Tailed Distribution of Drug-Drug Interaction in DDI-DB171 dataset
  • Figure 2: The framework of the proposed method (TFMD): (a) Model Architecture: Graph and Target modalities are considered as strong modality and enhanced by smiles sequential representation and enzyme representation respectively. Enhancement module consists of four independent MLP to extract the feature representation of all the features. The initial modalities are max-pooled to obtain the unique and distinct features of each modality, and then concatenated with fused representation at the end. All the features are concatenated and fed to a 4-layers NN classifier (Right End) for multi-classification prediction depending on the training dataset. (b) & (c) Loss Comparison: Given a tail class $c_t$, TFL exhibited considerable gradients and higher loss than FL and CE loss. The gradient offset mechanism in TFL recovers gradient back to the level of CE at least and maintain considerable gradient as $P_y \rightarrow 1$ instead of vanishing as shown in (c).
  • Figure 3: Incremental Experiments for Parameter Tuning