Table of Contents
Fetching ...

Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design

Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola

TL;DR

This work introduces HarmonicFlow, a harmonic-prior, self-conditioned flow-matching approach for 3D docking of multi-ligand complexes, and FlowSite, a joint discrete-continuous generative model that designs protein binding pockets by simultaneously predicting residue identities and ligand poses. HarmonicFlow outperforms state-of-the-art diffusion-based docking methods in pocket-level tasks and provides a robust foundation for FlowSite, which augments the docking process with residue-type generation and a powerful training regime including fake-ligand augmentation and multiple loss terms. FlowSite achieves substantially better binding-site recovery than baselines and approaches an oracle level without access to ground-truth ligand structures, demonstrating the feasibility of automated, generalizable binding-pocket design for single- and multi-ligand scenarios. The framework leverages SE(3)-equivariant refinement TFNs and invariant graph-attention layers to jointly model discrete and continuous data, offering a scalable path toward practical applications in drug design, enzyme engineering, and biomolecular design. The work advances both the theoretical and practical capabilities of generative biomolecular design by unifying structure generation and pocket design under a single, self-conditioned flow framework.

Abstract

A significant amount of protein function requires binding small molecules, including enzymatic catalysis. As such, designing binding pockets for small molecules has several impactful applications ranging from drug synthesis to energy storage. Towards this goal, we first develop HarmonicFlow, an improved generative process over 3D protein-ligand binding structures based on our self-conditioned flow matching objective. FlowSite extends this flow model to jointly generate a protein pocket's discrete residue types and the molecule's binding 3D structure. We show that HarmonicFlow improves upon state-of-the-art generative processes for docking in simplicity, generality, and average sample quality in pocket-level docking. Enabled by this structure modeling, FlowSite designs binding sites substantially better than baseline approaches.

Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design

TL;DR

This work introduces HarmonicFlow, a harmonic-prior, self-conditioned flow-matching approach for 3D docking of multi-ligand complexes, and FlowSite, a joint discrete-continuous generative model that designs protein binding pockets by simultaneously predicting residue identities and ligand poses. HarmonicFlow outperforms state-of-the-art diffusion-based docking methods in pocket-level tasks and provides a robust foundation for FlowSite, which augments the docking process with residue-type generation and a powerful training regime including fake-ligand augmentation and multiple loss terms. FlowSite achieves substantially better binding-site recovery than baselines and approaches an oracle level without access to ground-truth ligand structures, demonstrating the feasibility of automated, generalizable binding-pocket design for single- and multi-ligand scenarios. The framework leverages SE(3)-equivariant refinement TFNs and invariant graph-attention layers to jointly model discrete and continuous data, offering a scalable path toward practical applications in drug design, enzyme engineering, and biomolecular design. The work advances both the theoretical and practical capabilities of generative biomolecular design by unifying structure generation and pocket design under a single, self-conditioned flow framework.

Abstract

A significant amount of protein function requires binding small molecules, including enzymatic catalysis. As such, designing binding pockets for small molecules has several impactful applications ranging from drug synthesis to energy storage. Towards this goal, we first develop HarmonicFlow, an improved generative process over 3D protein-ligand binding structures based on our self-conditioned flow matching objective. FlowSite extends this flow model to jointly generate a protein pocket's discrete residue types and the molecule's binding 3D structure. We show that HarmonicFlow improves upon state-of-the-art generative processes for docking in simplicity, generality, and average sample quality in pocket-level docking. Enabled by this structure modeling, FlowSite designs binding sites substantially better than baseline approaches.
Paper Structure (36 sections, 8 equations, 17 figures, 7 tables, 4 algorithms)

This paper contains 36 sections, 8 equations, 17 figures, 7 tables, 4 algorithms.

Figures (17)

  • Figure 1: Binding site design. Given the backbone (green) and multi-ligand without structure, FlowSite generates residue types and structure (orange) to bind the multi-ligand and its jointly generated structure (blue). The majority of the pocket is omitted for visibility.
  • Figure 2: Overview of FlowSite. The generative process starts from a protein pocket's backbone atoms, initial residue types $\Tilde{{\bm{a}}}^0$, and initial ligand positions ${{\bm{x}}}_0$. Our joint discrete-continuous self-conditioned flow updates them to ${{\bm{a}}}_t$, ${{\bm{x}}}_t$ by following its vector field defined by the model outputs $\Tilde{{\bm{a}}}_1^t$, $\Tilde{{\bm{x}}}_1^t$. This integration is repeated until reaching $time=1$ with the produced sample ${{\bm{a}}}_1$, ${{\bm{x}}}_1$.
  • Figure 3: Harmonic Prior. Initial positions for the same single multi-ligand from an isotropic Gaussian (left) and from a harmonic prior (right). (Bound structure for this multi-ligand is in Figure \ref{['fig:multi_ligand_task_explanation']}).
  • Figure 4: FlowSite self-conditioned updates. Residue type predictions $\Tilde{{\bm{a}}}_1^t$ from invariant GAT layers and position predictions $\Tilde{{\bm{x}}}_1^t$ from equivariant TFN layers are used as self-conditioning inputs and to interpolate to the updates ${{\bm{a}}}_t$, ${{\bm{x}}}_t$.
  • Figure 5: Visualization of Fake Ligand creation. Depicted is a fake ligand created for the Ubiquitin protein. Out of all residues that have at least 4 contacts with other residues (apart from those that are within 7 locations in the chain) a residue is randomly selected as the fake ligand. Then we remove the residue itself from the protein and all residues that are within 7 locations in the chain.
  • ...and 12 more figures