Table of Contents
Fetching ...

MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space

Yanru Qu, Keyue Qiu, Yuxuan Song, Jingjing Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, Wei-Ying Ma

TL;DR

MolCRAFT tackles the critical problem of false positives in structure-based drug design by moving from hybrid, discrete-continuous diffusion to a fully continuous parameter-space framework. It employs SE-(3) equivariant networks and a noise-reduced sampling strategy within Bayesian flow dynamics to simultaneously model coordinates and atom types, improving 3D pose feasibility and binding interactions. On CrossDocked, MolCRAFT achieves a reference-level Vina Score of $-6.59$ kcal/mol and demonstrates superior conformational stability and sampling efficiency, outperforming autoregressive and diffusion baselines by substantial margins. The work provides a practical, scalable approach for generating realistic, high-affinity ligands directly in 3D space, with broad implications for accelerating structure-based drug discovery.

Abstract

Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and diffusion to SBDD, including mode collapse and hybrid continuous-discrete space. In this paper, we introduce MolCRAFT, the first SBDD model that operates in the continuous parameter space, together with a novel noise reduced sampling strategy. Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. To our best knowledge, MolCRAFT is the first to achieve reference-level Vina Scores (-6.59 kcal/mol) with comparable molecular size, outperforming other strong baselines by a wide margin (-0.84 kcal/mol). Code is available at https://github.com/AlgoMole/MolCRAFT.

MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space

TL;DR

MolCRAFT tackles the critical problem of false positives in structure-based drug design by moving from hybrid, discrete-continuous diffusion to a fully continuous parameter-space framework. It employs SE-(3) equivariant networks and a noise-reduced sampling strategy within Bayesian flow dynamics to simultaneously model coordinates and atom types, improving 3D pose feasibility and binding interactions. On CrossDocked, MolCRAFT achieves a reference-level Vina Score of kcal/mol and demonstrates superior conformational stability and sampling efficiency, outperforming autoregressive and diffusion baselines by substantial margins. The work provides a practical, scalable approach for generating realistic, high-affinity ligands directly in 3D space, with broad implications for accelerating structure-based drug discovery.

Abstract

Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and diffusion to SBDD, including mode collapse and hybrid continuous-discrete space. In this paper, we introduce MolCRAFT, the first SBDD model that operates in the continuous parameter space, together with a novel noise reduced sampling strategy. Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. To our best knowledge, MolCRAFT is the first to achieve reference-level Vina Scores (-6.59 kcal/mol) with comparable molecular size, outperforming other strong baselines by a wide margin (-0.84 kcal/mol). Code is available at https://github.com/AlgoMole/MolCRAFT.
Paper Structure (41 sections, 1 theorem, 22 equations, 11 figures, 7 tables, 2 algorithms)

This paper contains 41 sections, 1 theorem, 22 equations, 11 figures, 7 tables, 2 algorithms.

Key Result

Proposition 4.1

Denote the SE-(3) transformation as $T_g$, the likelihood is invariant w.r.t.$T_g$ on the protein-ligand complex: $p_\phi(T_g(\bm | \mathbf{p})) = p_\phi(\bm | \mathbf{p})$ if we shift the Center of Mass (CoM) of protein atoms to zero and parameterize $\boldsymbol{\Phi}(\boldsymbol{\theta}, \mathbf{

Figures (11)

  • Figure 1: Typical failure modes. (a) Unusual 3-membered rings generated by AR, large fused rings with more than 7 atoms generated by diffusion models. (b) Examples of steric clashes by FLAG, and other ligand undergoing significant conformational rearrangements upon redocking (Before: blue. After: green). (c) Failures in generation process. Left: atoms mis-connected in autoregressive sampling. Right: incomplete molecules with multiple components.
  • Figure 2: Bond length distribution of reference and generated molecules by autoregressive models (upper row) and non-autoregressive models (lower row) for top-5 frequent bond types.
  • Figure 3: Percentage of valid, complete molecules in the trajectories during generative process.
  • Figure 4: Overall Architecture.
  • Figure 5: Sample efficiency, where Generation Success means the generated molecules are both valid and complete.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Proposition 4.1