A Survey of Generative AI for de novo Drug Design: New Frontiers in Molecule and Protein Generation
Xiangru Tang, Howard Dai, Elizabeth Knight, Fang Wu, Yunyang Li, Tianxiao Li, Mark Gerstein
TL;DR
The survey catalogues how generative AI accelerates de novo drug design by organizing methods into small-molecule and protein design, detailing representative architectures (VAEs, GANs, flows, diffusion) and their integration with graph and geometric representations. It systematically maps task definitions, datasets, metrics, and state-of-the-art models across target-agnostic and target-aware molecular design, as well as comprehensive protein design tasks including representation learning, structure and sequence generation, and antibody-focused subfields. The work highlights diffusion models and equivariant graph networks as dominant recent trends, and emphasizes practical challenges such as benchmarking standardization, validation breadth, and explainability. It also points to future directions, including more realistic evaluation pipelines, richer multimodal representations, and closer alignment with experimental validation to enable reliable, scalable drug discovery. The accompanying repository complements the survey by offering organized access to cited sources and datasets to foster collaboration in this rapidly evolving field.
Abstract
Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at https://github.com/gersteinlab/GenAI4Drug.
