Table of Contents
Fetching ...

MagicDock: Toward Docking-oriented De Novo Ligand Design via Gradient Inversion

Zekai Chen, Xunkai Li, Sirui Zhang, Henan Sun, Jia Li, Zhenjun Li, Bing Zhou, Rong-Hua Li, Guoren Wang

TL;DR

MagicDock tackles the challenge of de novo ligand design by introducing a gradient-inversion framework that injects docking knowledge into a docking-aware backbone, coupled with differentiable surface modeling via learnable 3D point clouds. The method unfolds in four stages—docking-oriented ligand modeling, unsupervised pre-training, supervised fine-tuning, and inversion-based ligand generation—achieving SE(3)-equivariant representations and end-to-end gradient-based generation for both protein and small-molecule ligands. Theoretical guarantees cover SE(3)-equivariance, convergence of projected gradient descent, and information-theoretic optimality, while experiments across nine scenarios show substantial performance gains over state-of-the-art baselines, improved interpretability via gradient attribution, and favorable efficiency and scalability. The approach promises practical impact for structure-based drug design by enabling more accurate, diverse, and efficient de novo ligand discovery, with clear paths for extension to broader chemical spaces and multi-objective optimization.

Abstract

De novo ligand design is a fundamental task that seeks to generate protein or molecule candidates that can effectively dock with protein receptors and achieve strong binding affinity entirely from scratch. It holds paramount significance for a wide spectrum of biomedical applications. However, most existing studies are constrained by the \textbf{Pseudo De Novo}, \textbf{Limited Docking Modeling}, and \textbf{Inflexible Ligand Type}. To address these issues, we propose MagicDock, a forward-looking framework grounded in the progressive pipeline and differentiable surface modeling. (1) We adopt a well-designed gradient inversion framework. To begin with, general docking knowledge of receptors and ligands is incorporated into the backbone model. Subsequently, the docking knowledge is instantiated as reverse gradient flows by binding prediction, which iteratively guide the de novo generation of ligands. (2) We emphasize differentiable surface modeling in the docking process, leveraging learnable 3D point-cloud representations to precisely capture binding details, thereby ensuring that the generated ligands preserve docking validity through direct and interpretable spatial fingerprints. (3) We introduce customized designs for different ligand types and integrate them into a unified gradient inversion framework with flexible triggers, thereby ensuring broad applicability. Moreover, we provide rigorous theoretical guarantees for each component of MagicDock. Extensive experiments across 9 scenarios demonstrate that MagicDock achieves average improvements of 27.1\% and 11.7\% over SOTA baselines specialized for protein or molecule ligand design, respectively.

MagicDock: Toward Docking-oriented De Novo Ligand Design via Gradient Inversion

TL;DR

MagicDock tackles the challenge of de novo ligand design by introducing a gradient-inversion framework that injects docking knowledge into a docking-aware backbone, coupled with differentiable surface modeling via learnable 3D point clouds. The method unfolds in four stages—docking-oriented ligand modeling, unsupervised pre-training, supervised fine-tuning, and inversion-based ligand generation—achieving SE(3)-equivariant representations and end-to-end gradient-based generation for both protein and small-molecule ligands. Theoretical guarantees cover SE(3)-equivariance, convergence of projected gradient descent, and information-theoretic optimality, while experiments across nine scenarios show substantial performance gains over state-of-the-art baselines, improved interpretability via gradient attribution, and favorable efficiency and scalability. The approach promises practical impact for structure-based drug design by enabling more accurate, diverse, and efficient de novo ligand discovery, with clear paths for extension to broader chemical spaces and multi-objective optimization.

Abstract

De novo ligand design is a fundamental task that seeks to generate protein or molecule candidates that can effectively dock with protein receptors and achieve strong binding affinity entirely from scratch. It holds paramount significance for a wide spectrum of biomedical applications. However, most existing studies are constrained by the \textbf{Pseudo De Novo}, \textbf{Limited Docking Modeling}, and \textbf{Inflexible Ligand Type}. To address these issues, we propose MagicDock, a forward-looking framework grounded in the progressive pipeline and differentiable surface modeling. (1) We adopt a well-designed gradient inversion framework. To begin with, general docking knowledge of receptors and ligands is incorporated into the backbone model. Subsequently, the docking knowledge is instantiated as reverse gradient flows by binding prediction, which iteratively guide the de novo generation of ligands. (2) We emphasize differentiable surface modeling in the docking process, leveraging learnable 3D point-cloud representations to precisely capture binding details, thereby ensuring that the generated ligands preserve docking validity through direct and interpretable spatial fingerprints. (3) We introduce customized designs for different ligand types and integrate them into a unified gradient inversion framework with flexible triggers, thereby ensuring broad applicability. Moreover, we provide rigorous theoretical guarantees for each component of MagicDock. Extensive experiments across 9 scenarios demonstrate that MagicDock achieves average improvements of 27.1\% and 11.7\% over SOTA baselines specialized for protein or molecule ligand design, respectively.

Paper Structure

This paper contains 71 sections, 17 theorems, 105 equations, 13 figures, 12 tables, 5 algorithms.

Key Result

Theorem 1

Given the receptor point cloud $P_{\text{rec}}$ and the initial ligand point cloud $P_{\text{lig}}^0$, let the optimized ligand be $P_{\text{lig}}^* = \mathrm{MagicDock}(P_{\text{rec}}, P_{\text{lig}}^0)$. Under the assumptions that (i) the chemical validity set $\mathcal{C}_{\text{valid}}$ admits a where $g \cdot P := RP+t$ for $R \in \mathrm{SO(3)}$, $t \in \mathbb{R}^3$ acts on coordinates, whi

Figures (13)

  • Figure 1: Comparison between current works and MagicDock. This figure describes the three limitations of the existing methods and presents the framework of MagicDock, which achieves authentic de novo, biological significance and cross-category generality.
  • Figure 2: The overview of MagicDock Framework.
  • Figure 3: Performance comparison of attribution methods for interpretability, evaluated on Enrichment @k (1%, 5%, 10%), AUPRC, AUROC, and Spearman correlation.
  • Figure 4: Visualization of an example of generated protein-ligand complexes.
  • Figure 5: Comparison of the inversion framework with other generative ligand design methods.
  • ...and 8 more figures

Theorems & Definitions (37)

  • Theorem 1: SE(3)-Equivariance of MagicDock
  • Lemma 1.1: Stage 1: Surface Point Cloud Modeling
  • proof
  • Lemma 1.2: Stage 2: Pre-training with Equivariant Encoder
  • proof
  • Lemma 1.3: Stage 3: Supervised Fine-tuning with Equivariant Attention
  • proof
  • Lemma 1.4: Stage 4: Inversion-based Generation
  • proof
  • proof : Proof of Theorem
  • ...and 27 more