Table of Contents
Fetching ...

Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model

Aditya Malusare, Vaneet Aggarwal

TL;DR

KARL addresses the gap between rich biomedical knowledge and molecular generation by integrating knowledge-graph embeddings into a diffusion-based graph generator, aided by a property-inference network that maps graphs to KG context. It introduces a domain-constrained KGE training approach and a conditional diffusion framework with a reinforcement-learning reward to optimize drug-likeness, synthesizability, novelty, and target-specific properties. Empirically, KARL achieves strong unconditional generation performance, improves KG embedding quality under domain constraints, and enhances targeted drug discovery as shown by docking metrics on multiple proteins. This knowledge-enhanced pipeline enables interpretable, scalable, and controllable generation of drug candidates guided by biomedical knowledge graphs.

Abstract

Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.

Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model

TL;DR

KARL addresses the gap between rich biomedical knowledge and molecular generation by integrating knowledge-graph embeddings into a diffusion-based graph generator, aided by a property-inference network that maps graphs to KG context. It introduces a domain-constrained KGE training approach and a conditional diffusion framework with a reinforcement-learning reward to optimize drug-likeness, synthesizability, novelty, and target-specific properties. Empirically, KARL achieves strong unconditional generation performance, improves KG embedding quality under domain constraints, and enhances targeted drug discovery as shown by docking metrics on multiple proteins. This knowledge-enhanced pipeline enables interpretable, scalable, and controllable generation of drug candidates guided by biomedical knowledge graphs.

Abstract

Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.
Paper Structure (24 sections, 17 equations, 2 figures, 4 tables, 2 algorithms)

This paper contains 24 sections, 17 equations, 2 figures, 4 tables, 2 algorithms.

Figures (2)

  • Figure 1: KARL ModelA Biomedical knowledge graph showing the links between drugs, target proteins, diseases, genes, biological pathways, etc. The edges represent different types of relationships like targeting, synergistic or unwanted interactions, or side effects. B We obtain Knowledge Graph Embeddings (KGEs) (Sec \ref{['sec:kge']}) for the target drug candidate. C These embeddings are used to guide a generative process (Sec \ref{['sec:implement']}). D The end result of the generative process is a novel drug candidate.
  • Figure 2: Generative process. The random initialization of the molecular graph G converges to a valid non-benzeneoid aromatic compound called tropone through the diffusion model.