Table of Contents
Fetching ...

Hybrid Generative AI for De Novo Design of Co-Crystals with Enhanced Tabletability

Nina Gubina, Andrei Dmitrenko, Gleb Solovev, Lyubov Yamshchikova, Oleg Petrov, Ivan Lebedev, Nikita Serov, Grigorii Kirgizov, Nikolay Nikitin, Vladimir Vinogradov

TL;DR

This work presents GEMCODE, a novel pipeline for automated co-crystal screening based on the hybridization of deep generative models and evolutionary optimization for broader exploration of the target chemical space and explores the potential of language models in generating co-crystals.

Abstract

Co-crystallization is an accessible way to control physicochemical characteristics of organic crystals, which finds many biomedical applications. In this work, we present Generative Method for Co-crystal Design (GEMCODE), a novel pipeline for automated co-crystal screening based on the hybridization of deep generative models and evolutionary optimization for broader exploration of the target chemical space. GEMCODE enables fast de novo co-crystal design with target tabletability profiles, which is crucial for the development of pharmaceuticals. With a series of experimental studies highlighting validation and discovery cases, we show that GEMCODE is effective even under realistic computational constraints. Furthermore, we explore the potential of language models in generating co-crystals. Finally, we present numerous previously unknown co-crystals predicted by GEMCODE and discuss its potential in accelerating drug development.

Hybrid Generative AI for De Novo Design of Co-Crystals with Enhanced Tabletability

TL;DR

This work presents GEMCODE, a novel pipeline for automated co-crystal screening based on the hybridization of deep generative models and evolutionary optimization for broader exploration of the target chemical space and explores the potential of language models in generating co-crystals.

Abstract

Co-crystallization is an accessible way to control physicochemical characteristics of organic crystals, which finds many biomedical applications. In this work, we present Generative Method for Co-crystal Design (GEMCODE), a novel pipeline for automated co-crystal screening based on the hybridization of deep generative models and evolutionary optimization for broader exploration of the target chemical space. GEMCODE enables fast de novo co-crystal design with target tabletability profiles, which is crucial for the development of pharmaceuticals. With a series of experimental studies highlighting validation and discovery cases, we show that GEMCODE is effective even under realistic computational constraints. Furthermore, we explore the potential of language models in generating co-crystals. Finally, we present numerous previously unknown co-crystals predicted by GEMCODE and discuss its potential in accelerating drug development.

Paper Structure

This paper contains 76 sections, 14 equations, 14 figures, 11 tables.

Figures (14)

  • Figure 1: GEMCODE: a pipeline for generative co-crystal design consisting of models (LSTM-based GAN, T-VAE, T-CVAE) generating coformer candidates, gradient boosting (GB) classification models predicting the mechanical properties of co-crystals based on the generated coformers, an evolutionary algorithm producing additional coformer candidates with improved tabletability profiles, and a graph neural network (GNN) ranking co-crystals according to the probability of formation.
  • Figure 2: Accuracy and F1 score metrics for the ML models predicting three mechanical properties of co-crystals. (a) Unobstructed planes. (b) Orthogonal planes. (c) H-bonds bridging. The performance of each model is shown before (“Raw data”) and after (“Processed data”) the feature engineering and feature selection steps.
  • Figure 3: (a) Schematic representation of the mechanical properties of co-crystals. No slip plane and H-bond bridging are associated with low tabletability. The other two properties positively correlate with tabletability. (b) Schematic representation of the particle deformation during powder compression. (c) Number of coformer samples of each category per mechanical property.
  • Figure 4: Molecular representation using the chemical structure of caffeine as an example in the form of SMILES, molecular fingerprints, and molecular descriptors.
  • Figure 5: GAN training results on ChEMBL datasets and coformers: (a) plot of the growth of the valid chemical structures share in a batch, (b) t-SNE visualization of molecules from the ChEMBL dataset and coformers.
  • ...and 9 more figures