Table of Contents
Fetching ...

Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks

Atabey Ünlü, Elif Çevrim, Melih Gökay Yiğit, Ahmet Sarıgün, Hayriye Çelikbilek, Osman Bayram, Deniz Cansen Kahraman, Abdurrahman Olğaç, Ahmet Sureyya Rifaioğlu, Erden Banoğlu, Tunca Doğan

TL;DR

DrugGEN introduces a graph transformer-based GAN to enable target-centric de novo design of drug-like molecules, trained on ChEMBL-derived bioactive molecules and evaluated against AKT1 (and CDK2). The system uses real molecular graphs as generator input, enhanced edge-aware attention, and a Wasserstein GAN framework to generate inhibitors with favorable docking and drug-likeness properties. Comprehensive validation includes MOSES benchmarking, docking to AKT1/CDK2, DEEPScreen-based DTI predictions, dimensionality-reduction space analyses, and MD simulations, followed by synthesis and in vitro AKT1 inhibition with two compounds showing low micromolar activity. The work demonstrates target-centric generation within a scalable, open-source pipeline and discusses limitations of GAN training, outlining future directions toward broader targets, fragment-based construction, and integrated target features.

Abstract

Discovering novel drug candidate molecules is one of the most fundamental and critical steps in drug development. Generative deep learning models, which create synthetic data given a probability distribution, offer a high potential for designing de novo molecules. However, to be utilisable in real life drug development pipelines, these models should be able to design drug like and target centric molecules. In this study, we propose an end to end generative system, DrugGEN, for the de novo design of drug candidate molecules that interact with intended target proteins. The proposed method represents molecules as graphs and processes them via a generative adversarial network comprising graph transformer layers. The system is trained using a large dataset of drug like compounds and target specific bioactive molecules to design effective inhibitory molecules against the AKT1 protein, which is critically important in developing treatments for various types of cancer. We conducted molecular docking and dynamics to assess the target centric generation performance of the model, as well as attention score visualisation to examine model interpretability. In parallel, selected compounds were chemically synthesised and evaluated in the context of in vitro enzymatic assays, which identified two bioactive molecules that inhibited AKT1 at low micromolar concentrations. These results indicate that DrugGEN's de novo molecules have a high potential for interacting with the AKT1 protein at the level of its native ligands. Using the open access DrugGEN codebase, it is possible to easily train models for other druggable proteins, given a dataset of experimentally known bioactive molecules.

Target Specific De Novo Design of Drug Candidate Molecules with Graph Transformer-based Generative Adversarial Networks

TL;DR

DrugGEN introduces a graph transformer-based GAN to enable target-centric de novo design of drug-like molecules, trained on ChEMBL-derived bioactive molecules and evaluated against AKT1 (and CDK2). The system uses real molecular graphs as generator input, enhanced edge-aware attention, and a Wasserstein GAN framework to generate inhibitors with favorable docking and drug-likeness properties. Comprehensive validation includes MOSES benchmarking, docking to AKT1/CDK2, DEEPScreen-based DTI predictions, dimensionality-reduction space analyses, and MD simulations, followed by synthesis and in vitro AKT1 inhibition with two compounds showing low micromolar activity. The work demonstrates target-centric generation within a scalable, open-source pipeline and discusses limitations of GAN training, outlining future directions toward broader targets, fragment-based construction, and integrated target features.

Abstract

Discovering novel drug candidate molecules is one of the most fundamental and critical steps in drug development. Generative deep learning models, which create synthetic data given a probability distribution, offer a high potential for designing de novo molecules. However, to be utilisable in real life drug development pipelines, these models should be able to design drug like and target centric molecules. In this study, we propose an end to end generative system, DrugGEN, for the de novo design of drug candidate molecules that interact with intended target proteins. The proposed method represents molecules as graphs and processes them via a generative adversarial network comprising graph transformer layers. The system is trained using a large dataset of drug like compounds and target specific bioactive molecules to design effective inhibitory molecules against the AKT1 protein, which is critically important in developing treatments for various types of cancer. We conducted molecular docking and dynamics to assess the target centric generation performance of the model, as well as attention score visualisation to examine model interpretability. In parallel, selected compounds were chemically synthesised and evaluated in the context of in vitro enzymatic assays, which identified two bioactive molecules that inhibited AKT1 at low micromolar concentrations. These results indicate that DrugGEN's de novo molecules have a high potential for interacting with the AKT1 protein at the level of its native ligands. Using the open access DrugGEN codebase, it is possible to easily train models for other druggable proteins, given a dataset of experimentally known bioactive molecules.
Paper Structure (32 sections, 8 equations, 8 figures, 2 tables)

This paper contains 32 sections, 8 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The workflow of the study. (A) Preparation of datasets including molecules and bioactivities, together with the graph-based encoding of samples in datasets, (B) the graph transformer GAN-based architecture of DrugGEN, model training and evaluation via subjecting de novo molecules to fundamental benchmarks and drug-likeness-related metrics, and (C) subsequent selection of molecules via a series of in silico experiments to identify promising candidates that effectively target the selected protein, (D) the schematic representation of the architecture of the DrugGEN model with powerful graph transformer encoder modules in both generator and discriminator networks. The generator module transforms the given input into a new molecular representation. The discriminator compares the generated de novo molecules to the known inhibitors of the given target protein, scoring them for their assignment to the groups of “real” and “fake” molecules (abbreviations; MLP: multi-layer perceptron, Norm: normalisation, Concat: concatenation, MatMul: matrix multiplication, ElementMul: element-wise multiplication, Upd: updated, N: total # of heavy atoms, T: # of atom types, B: # of bond types, D: hidden dimension size).
  • Figure 2: Exploration of de novo molecules via downstream analysis. (A) Bar plots displaying the median of binding free energies measured in the docking analysis of de novo molecules generated by DrugGEN, RELATION, TargetDiff, Pocket2Mol, TRIOMPHE-BOA, and ResGen models (molecules are docked into the binding pocket of the AKT1/CDK2 protein structure). The whiskers on the bars represent the standard error of the median (median values are utilised due to the non-normal distribution of data). The percentages above each bar represent docking scores normalised with respect to the scores of real AKT1/CDK2 inhibitors, with the top real AKT1/CDK2 inhibitors set at 100%. (B) 2-D visualisation of the molecules generated by DrugGEN and DrugGEN-NoTarget models, real AKT1 inhibitors and randomly selected ChEMBL molecules via UMAP and t-SNE projections. (C) Deep learning-based bioactivity prediction analysis. Top: the workflow of DEEPScreen, which uses 2-D pixel-based (image) structural representations of molecules as input and processes them via deep convolutional neural networks Rifaioglu_Nalbat_Atalay_Martin_Cetin-Atalay_Doğan_2020. Bottom-left: the DEEPScreen AKT1 model is trained with binarised experimental bioactivity data points of the AKT1 protein (obtained from ChEMBL binding assays). Bottom-right: molecules (i.e., DrugGEN and DrugGEN-NoTarget molecules and randomly selected ChEMBL molecules) are submitted to DrugGEN AKT1 DTI prediction model for inference.
  • Figure 3: 30 promising de novo molecules to effectively target AKT1 protein (generated by DrugGEN model), selected via expert curation from the dataset of molecules with sufficiently low binding free energies (< -8 kcal/mol) in the molecular docking experiment and deep learning-based DTI predictions Rifaioglu_Nalbat_Atalay_Martin_Cetin-Atalay_Doğan_2020) as “active”.
  • Figure 4: Structural analysis of capivasertib (bound ligand in 4GV1) and five de novo generated molecules (Mol. ID 1–5) that were selected for experimental validation. (A) 2D visualisation of capivasertib and Molecules 1–5, with their unique identifiers assigned during molecule generation, QED and SA values. (B) 3D protein-ligand interactions of capivasertib and Molecules 1–5 after molecular docking with AKT1 (PDB ID: 4GV1). The interactions are visualised with active site residues of AKT1 highlighted in beige color. Different interaction types are depicted as follows: blue lines represent hydrogen bonds, yellow dashed lines indicate salt bridges, gray dashed lines denote hydrophobic interactions, orange dashed lines highlight $\pi$-cation interactions, and green lines represent halogen bonds. (C) Initial binding orientations of capivasertib, Molecules 1 and 2 at the starting point of molecular dynamics (MD) simulations. (D) Key protein-ligand interactions observed during MD simulations, visualised with interacting residues and interaction types. The depicted poses represent the most populated conformations from each simulation. (E) Root-mean-square deviation (RMSD) values of capivasertib, Molecules 1 and 2 in complex with AKT1. (F) Root-mean-square fluctuation (RMSF) values of ligand atoms in the same complexes. Abbreviations: Exp. Mol. ID: Experimental molecule identifier. I-VII represents $\beta$-sheet numbers, g.l represents glycine-rich loop, c.l represents catalytic loop, GK represents gatekeeper residue, and xDFG represents highly conserved kinase residues; linker represents the loop that connects the hinge domain to $\alpha$C-helix. Gray dashed lines represent Van der Waals interactions. Blue lines represent hydrogen bonds and water bridges. Green lines indicate halogen bonds. Yellow dashed lines represent salt bridges. Directional interactions were noted only if the occupancy value was found above 10%; however, for visual clarity, occupancy values of the water bridges were stated only if they were above 30%.
  • Figure 5: Visualisation of DrugGEN attention maps on three de novo molecules. The left-side column depicts protein-ligand interaction diagrams obtained from the molecular docking of DrugGEN’s three de novo molecules (in A, C and E) with the AKT1 protein structure (PDB id: “4GV1”) Addie_Ballard_Buttar_Crafter_Currie_Davies_Debreczeni_Dry_Dudley_Greenwood_et_al._2013. The docked ligands are located in the binding pocket, with interactions between residues and the ligand shown as lines, coloured by the interaction type. Protein residues are coloured according to their physicochemical properties. The right-side column depicts the attention maps of the same three de novo molecules (in B, D and F) retrieved from the graph transformer module of the DrugGEN generator network. Atoms that receive the highest attention scores are highlighted with colours (green: 1st, 2nd and 3rd atoms; yellow: 4th, 5th and 6th atoms; and red: 7th, 8th and 9th atoms with the highest attention scores). Atoms with lower attention scores are not coloured. In the right-side plots, receptor-ligand interactions (i.e., those obtained from the docking analysis) are represented by dashed lines in the attention-score-based colour of the molecule atom involved in the respective interaction. If an interacting atom could not be retrieved (i.e., the atom received a low attention score), its interaction is given in grey colour. (A, B) molecule id: $MOL\_02\_045597$, docking score: -9.803 kcal/mol, (C, D) molecule id: $MOL\_02\_000496$, docking score: -9.693 kcal/mol, (E, F) molecule id: $MOL\_02\_008350$, docking score: -9.619 kcal/mol.
  • ...and 3 more figures