De Novo Generation of Hit-like Molecules from Gene Expression Profiles via Deep Learning
Chen Li, Yoshihiro Yamanishi
TL;DR
This work tackles de novo hit-like molecule design by conditioning chemical generation on gene expression responses. It presents HVL2Mol, a two-stage framework that uses a VAE to extract latent biological features from expression profiles and a conditional LSTM to generate SMILES strings guided by those features. The approach yields valid, unique, and novel molecules with preserved drug-likeness (QED) and synthesizability (SA), and case studies show therapeutic relevance against gastric cancer, dermatitis, and Alzheimer's disease through disease-reversal profiles. The method advances omics-guided drug design by directly translating cellular response signals into chemically plausible candidates, with potential to accelerate lead discovery.
Abstract
De novo generation of hit-like molecules is a challenging task in the drug discovery process. Most methods in previous studies learn the semantics and syntax of molecular structures by analyzing molecular graphs or simplified molecular input line entry system (SMILES) strings; however, they do not take into account the drug responses of the biological systems consisting of genes and proteins. In this study we propose a hybrid neural network, HNN2Mol, which utilizes gene expression profiles to generate molecular structures with desirable phenotypes for arbitrary target proteins. In the algorithm, a variational autoencoder is employed as a feature extractor to learn the latent feature distribution of the gene expression profiles. Then, a long short-term memory is leveraged as the chemical generator to produce syntactically valid SMILES strings that satisfy the feature conditions of the gene expression profile extracted by the feature extractor. Experimental results and case studies demonstrate that the proposed HNN2Mol model can produce new molecules with potential bioactivities and drug-like properties.
