Table of Contents
Fetching ...

Augmenting generative models with biomedical knowledge graphs improves targeted drug discovery

Aditya Malusare, Vineet Punyamoorty, Vaneet Aggarwal

TL;DR

K-DREAM introduces a knowledge-graph guided diffusion framework for molecule generation that embeds biomedical knowledge from PrimeKG using TransE and steers diffusion with a Context Regressor Network and embedding-space interpolation. This approach yields biologically relevant drug candidates and enables multi-target design, achieving state-of-the-art docking performance across five targets with mean top-5% scores around -12 to -10 kcal/mol, outperforming multiple baselines. Ablation studies confirm the critical role of KG guidance, and interpolation in KG embedding space enables balanced dual-target compounds, illustrating a system-level advancement for rational drug design. By integrating knowledge graphs into generative chemistry, K-DREAM demonstrates a scalable pathway to biologically informed discovery with potential to accelerate preclinical development, while noting the need for experimental validation and awareness of knowledge-graph biases. The strength of the guidance is controlled by the hyperparameter lambda_X, enabling navigation of the trade-off between exploration and target-focused optimization.

Abstract

Recent breakthroughs in generative modeling have demonstrated remarkable capabilities in molecular generation, yet the integration of comprehensive biomedical knowledge into these models has remained an untapped frontier. In this study, we introduce K-DREAM (Knowledge-Driven Embedding-Augmented Model), a novel framework that leverages knowledge graphs to augment diffusion-based generative models for drug discovery. By embedding structured information from large-scale knowledge graphs, K-DREAM directs molecular generation toward candidates with higher biological relevance and therapeutic suitability. This integration ensures that the generated molecules are aligned with specific therapeutic targets, moving beyond traditional heuristic-driven approaches. In targeted drug design tasks, K-DREAM generates drug candidates with improved binding affinities and predicted efficacy, surpassing current state-of-the-art generative models. It also demonstrates flexibility by producing molecules designed for multiple targets, enabling applications to complex disease mechanisms. These results highlight the utility of knowledge-enhanced generative models in rational drug design and their relevance to practical therapeutic development.

Augmenting generative models with biomedical knowledge graphs improves targeted drug discovery

TL;DR

K-DREAM introduces a knowledge-graph guided diffusion framework for molecule generation that embeds biomedical knowledge from PrimeKG using TransE and steers diffusion with a Context Regressor Network and embedding-space interpolation. This approach yields biologically relevant drug candidates and enables multi-target design, achieving state-of-the-art docking performance across five targets with mean top-5% scores around -12 to -10 kcal/mol, outperforming multiple baselines. Ablation studies confirm the critical role of KG guidance, and interpolation in KG embedding space enables balanced dual-target compounds, illustrating a system-level advancement for rational drug design. By integrating knowledge graphs into generative chemistry, K-DREAM demonstrates a scalable pathway to biologically informed discovery with potential to accelerate preclinical development, while noting the need for experimental validation and awareness of knowledge-graph biases. The strength of the guidance is controlled by the hyperparameter lambda_X, enabling navigation of the trade-off between exploration and target-focused optimization.

Abstract

Recent breakthroughs in generative modeling have demonstrated remarkable capabilities in molecular generation, yet the integration of comprehensive biomedical knowledge into these models has remained an untapped frontier. In this study, we introduce K-DREAM (Knowledge-Driven Embedding-Augmented Model), a novel framework that leverages knowledge graphs to augment diffusion-based generative models for drug discovery. By embedding structured information from large-scale knowledge graphs, K-DREAM directs molecular generation toward candidates with higher biological relevance and therapeutic suitability. This integration ensures that the generated molecules are aligned with specific therapeutic targets, moving beyond traditional heuristic-driven approaches. In targeted drug design tasks, K-DREAM generates drug candidates with improved binding affinities and predicted efficacy, surpassing current state-of-the-art generative models. It also demonstrates flexibility by producing molecules designed for multiple targets, enabling applications to complex disease mechanisms. These results highlight the utility of knowledge-enhanced generative models in rational drug design and their relevance to practical therapeutic development.

Paper Structure

This paper contains 21 sections, 11 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The Knowledge-Driven Embedding-Augmented Model (K-DREAM).A Overview of the K-DREAM generative model for molecular structures, incorporating guidance information from embeddings derived from the knowledge graph. The diffusion process is guided to produce molecules that are both chemically valid and biologically relevant to a given target embedding. Molecules are evaluated using metrics such as docking score, Quantitative Estimate of Drug-likeness (QED), and Synthetic Accessibility (SA). B Embeddings of the PrimeKG Chandak2023 knowledge graph created using the TransE NIPS2013_1cecc7a7 model, projected using the Uniform Manifold Approximation Projection (UMAP) algorithm. C Visualization of the relationships between UMAP-projected PrimeKG embeddings using edge bundling (a technique to reduce visual clutter in network visualizations by grouping edges that follow similar paths). Bundled edges represent relationships between entities, with thicker bundles indicating stronger or more numerous connections between related clusters.
  • Figure 2: Docking scores. The mean docking scores of the top 5% molecules generated by K-DREAM are compared against baseline models for five protein targets (left). For each protein, we also show the top 3 molecules generated by K-DREAM, along with their docking scores (right).
  • Figure 3: Molecular docking score distribution with varying guidance levels. The extent of guidance is controlled with a hyperparameter $\lambda_X$ that determines the weight of the guidance term in the loss function.
  • Figure 4: Multi-target Drug Design. The distribution of the top 10% of molecules (ranked by docking score) generated by K-DREAM for PARP1-targeted (green), JAK2-targeted (orange), and an interpolated target (blue), evaluated by their docking scores with PARP1 (x-axis) and JAK2 (y-axis). We can see that the interpolated molecules achieve a balance between the two target proteins with a high mean score on both axes.
  • Figure 5: Targeted Drug Design - QED, SA and SIM scores. The distributions of Quantitative Estimate of Drug-likeness (QED), Synthetic Accessibility (SA) and Tanimoto similarity (SIM) scores for the top 5% candidates chosen for evaluation in the targeted drug design experiment.
  • ...and 1 more figures