Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules versus Therapeutic Peptides
Yiquan Wang, Yahui Ma, Yuhan Chang, Jiayao Yan, Jialin Zhang, Minnuo Cai, Kai Wei
TL;DR
This review analyzes how diffusion models are transforming drug discovery by enabling de novo design of both small molecules and therapeutic peptides. It contrasts modality-specific representations, benchmarks, and design objectives, highlighting representative systems such as Pocket2Mol, DiffSBDD, TargetDiff, RFdiffusion, and ProteinMPNN. The authors underscore data scarcity, the unreliability of scoring functions, and the necessity of experimental validation, arguing for integrated, automated DBTL pipelines to realize on-demand therapeutic design. The work articulates practical implications for accelerating discovery while outlining key challenges and opportunities to bridge computational designs with real-world synthesis and biology. Overall, diffusion models offer a promising path to shift from broad chemical exploration to targeted, efficient engineering of novel therapeutics within an automated discovery framework.
Abstract
Diffusion models have emerged as a leading framework in generative modeling, poised to transform the traditionally slow and costly process of drug discovery. This review provides a systematic comparison of their application in designing two principal therapeutic modalities: small molecules and therapeutic peptides. We dissect how the unified framework of iterative denoising is adapted to the distinct molecular representations, chemical spaces, and design objectives of each modality. For small molecules, these models excel at structure-based design, generating novel, pocket-fitting ligands with desired physicochemical properties, yet face the critical hurdle of ensuring chemical synthesizability. Conversely, for therapeutic peptides, the focus shifts to generating functional sequences and designing de novo structures, where the primary challenges are achieving biological stability against proteolysis, ensuring proper folding, and minimizing immunogenicity. Despite these distinct challenges, both domains face shared hurdles: the scarcity of high-quality experimental data, the reliance on inaccurate scoring functions for validation, and the crucial need for experimental validation. We conclude that the full potential of diffusion models will be unlocked by bridging these modality-specific gaps and integrating them into automated, closed-loop Design-Build-Test-Learn (DBTL) platforms, thereby shifting the paradigm from mere chemical exploration to the on-demand engineering of novel~therapeutics.
