Table of Contents
Fetching ...

SigmaDock: Untwisting Molecular Docking With Fragment-Based SE(3) Diffusion

Alvaro Prat, Leo Zhang, Charlotte M. Deane, Yee Whye Teh, Garrett M. Morris

TL;DR

SigmaDock is the first deep learning approach to surpass classical physics-based docking under the PB train-test split, marking a significant leap forward in the reliability and feasibility of deep learning for molecular modelling.

Abstract

Determining the binding pose of a ligand to a protein, known as molecular docking, is a fundamental task in drug discovery. Generative approaches promise faster, improved, and more diverse pose sampling than physics-based methods, but are often hindered by chemically implausible outputs, poor generalisability, and high computational cost. To address these challenges, we introduce a novel fragmentation scheme, leveraging inductive biases from structural chemistry, to decompose ligands into rigid-body fragments. Building on this decomposition, we present SigmaDock, an SE(3) Riemannian diffusion model that generates poses by learning to reassemble these rigid bodies within the binding pocket. By operating at the level of fragments in SE(3), SigmaDock exploits well-established geometric priors while avoiding overly complex diffusion processes and unstable training dynamics. Experimentally, we show SigmaDock achieves state-of-the-art performance, reaching Top-1 success rates (RMSD<2 & PB-valid) above 79.9% on the PoseBusters set, compared to 12.7-30.8% reported by recent deep learning approaches, whilst demonstrating consistent generalisation to unseen proteins. SigmaDock is the first deep learning approach to surpass classical physics-based docking under the PB train-test split, marking a significant leap forward in the reliability and feasibility of deep learning for molecular modelling.

SigmaDock: Untwisting Molecular Docking With Fragment-Based SE(3) Diffusion

TL;DR

SigmaDock is the first deep learning approach to surpass classical physics-based docking under the PB train-test split, marking a significant leap forward in the reliability and feasibility of deep learning for molecular modelling.

Abstract

Determining the binding pose of a ligand to a protein, known as molecular docking, is a fundamental task in drug discovery. Generative approaches promise faster, improved, and more diverse pose sampling than physics-based methods, but are often hindered by chemically implausible outputs, poor generalisability, and high computational cost. To address these challenges, we introduce a novel fragmentation scheme, leveraging inductive biases from structural chemistry, to decompose ligands into rigid-body fragments. Building on this decomposition, we present SigmaDock, an SE(3) Riemannian diffusion model that generates poses by learning to reassemble these rigid bodies within the binding pocket. By operating at the level of fragments in SE(3), SigmaDock exploits well-established geometric priors while avoiding overly complex diffusion processes and unstable training dynamics. Experimentally, we show SigmaDock achieves state-of-the-art performance, reaching Top-1 success rates (RMSD<2 & PB-valid) above 79.9% on the PoseBusters set, compared to 12.7-30.8% reported by recent deep learning approaches, whilst demonstrating consistent generalisation to unseen proteins. SigmaDock is the first deep learning approach to surpass classical physics-based docking under the PB train-test split, marking a significant leap forward in the reliability and feasibility of deep learning for molecular modelling.

Paper Structure

This paper contains 91 sections, 4 theorems, 69 equations, 12 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

For standard molecular topologies, torsional models define nonlinear mappings from torsion angles to Cartesian coordinates, producing highly entangled, non-product induced measures. In contrast, disjoint rigid fragments yield a factorised product of Haar measures on $\mathrm{SE}(3)^m$.

Figures (12)

  • Figure 1: Illustration of SigmaDock using PDB 1V4S and ligand MRK. We create an initial conformation of a query ligand where we define our $m$ rigid body fragments (colour coded). The corresponding forward diffusion process operates in $\mathop{\mathrm{SE}}\nolimits(3)^m$ via independent roto-translations.
  • Figure 2: A: Illustration of a dihedral $\phi_{ABCD}$ across torsional bond $\overline{BC}$, defined as the angle between planes $\overline{ABC}$ and $\overline{BCD}$, across two adjacent benzene rings in ligand BFL; B: Bound (red) and aligned (green) poses for BFL in PDB 1Q4G with an optimised alignment RMSD of 0.11Å; C: Conformational ensembles generated from $\pi_{\mathcal{M}_c}$ for ligands SKF, CEL, and IH5 respectively. Notably, the most significant structural changes are derived from torsions across the rotatable bonds.
  • Figure 3: Illustrative example of how FR3D reduces the number of fragments (colour coded) required to represent rigid bodies on ligand TNK into irreducible form. A: Defining fragments by snapping all torsional bonds (ribbons); B: FR3D recursively attempts to reduce the $k$ torsional bonds and removes over-constrained dummies in the process (denoted by the coloured rings), which otherwise define a dihedral across the merged fragment; C; Over-constrained dummies removed and triangulation edges displayed under a different stochastic reduction (equiprobable to solution b).
  • Figure 4: Performance benchmarks. Left: Comparative performance of SigmaDock on the PB and AX diverse sets against prior methods. Extracted from abramson2024accuratebuttenschoen2024posebusters. (*) Denotes classical docking; (**) Are not open-sourced. Right: Performance breakdown across sequence similarity splits in the PB set.
  • Figure 5: Visual helper for Lemma \ref{['lemma:triangulation']} showing corresponding atoms for fragment $\mathcal{A}$ shown in blue and fragment $\mathcal{D}$ denoted in red. For additional clarity, dummy atoms across torsional bond $\overline{BC}$ are marked with an asterisk.
  • ...and 7 more figures

Theorems & Definitions (9)

  • Theorem 1
  • Lemma 1
  • Theorem 2
  • Proposition 1
  • proof
  • proof
  • proof
  • proof
  • proof