Table of Contents
Fetching ...

GFlowNets for Learning Better Drug-Drug Interaction Representations

Azmine Toushik Wasi

TL;DR

DDI prediction is hampered by severe class imbalance across interaction types, biasing models toward frequent interactions. The authors introduce a framework combining Generative Flow Networks (GFlowNets) with a Variational Graph Autoencoder (VGAE) to generate balanced synthetic DDI samples, guided by a reward that favors rare types and VGAE plausibility. An end-to-end pipeline pre-trains VGAE, trains a GFlowNet with a Trajectory Balance loss, augments data with synthetic samples, and re-trains the VGAE for final prediction. On DrugBank data, diversity and coverage of rare interaction types improve substantially while standard predictive metrics remain high, demonstrating more robust and clinically relevant DDI representations. This approach offers a scalable strategy for imbalanced biomedical graph problems and can generalize to other rare-event prediction tasks.

Abstract

Drug-drug interactions pose a significant challenge in clinical pharmacology, with severe class imbalance among interaction types limiting the effectiveness of predictive models. Common interactions dominate datasets, while rare but critical interactions remain underrepresented, leading to poor model performance on infrequent cases. Existing methods often treat DDI prediction as a binary problem, ignoring class-specific nuances and exacerbating bias toward frequent interactions. To address this, we propose a framework combining Generative Flow Networks (GFlowNet) with Variational Graph Autoencoders (VGAE) to generate synthetic samples for rare classes, improving model balance and generate effective and novel DDI pairs. Our approach enhances predictive performance across interaction types, ensuring better clinical reliability.

GFlowNets for Learning Better Drug-Drug Interaction Representations

TL;DR

DDI prediction is hampered by severe class imbalance across interaction types, biasing models toward frequent interactions. The authors introduce a framework combining Generative Flow Networks (GFlowNets) with a Variational Graph Autoencoder (VGAE) to generate balanced synthetic DDI samples, guided by a reward that favors rare types and VGAE plausibility. An end-to-end pipeline pre-trains VGAE, trains a GFlowNet with a Trajectory Balance loss, augments data with synthetic samples, and re-trains the VGAE for final prediction. On DrugBank data, diversity and coverage of rare interaction types improve substantially while standard predictive metrics remain high, demonstrating more robust and clinically relevant DDI representations. This approach offers a scalable strategy for imbalanced biomedical graph problems and can generalize to other rare-event prediction tasks.

Abstract

Drug-drug interactions pose a significant challenge in clinical pharmacology, with severe class imbalance among interaction types limiting the effectiveness of predictive models. Common interactions dominate datasets, while rare but critical interactions remain underrepresented, leading to poor model performance on infrequent cases. Existing methods often treat DDI prediction as a binary problem, ignoring class-specific nuances and exacerbating bias toward frequent interactions. To address this, we propose a framework combining Generative Flow Networks (GFlowNet) with Variational Graph Autoencoders (VGAE) to generate synthetic samples for rare classes, improving model balance and generate effective and novel DDI pairs. Our approach enhances predictive performance across interaction types, ensuring better clinical reliability.

Paper Structure

This paper contains 18 sections, 7 equations, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: The proposed end-to-end framework. (a) A VGAE is first pre-trained on the original imbalanced DDI graph to learn drug embeddings. (b) A GFlowNet is then trained, using a reward function that combines plausibility from the VGAE and a rareness score, to learn a policy for generating synthetic DDIs. (c) The original data is augmented with the GFlowNet samples and used to train the final, robust VGAE model.
  • Figure 2: Composition of the GFlowNet reward function. The reward for a generated sample $(d_i, d_j, t)$ is the product of a rareness score, which is inversely proportional to the type's frequency in the training data, and a plausibility score, derived from the pre-trained VGAE decoder's confidence.