Table of Contents
Fetching ...

FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

Alex Morehead, Jianlin Cheng

TL;DR

FlowDock presents a novel flow‑matching framework that jointly predicts protein–ligand complex structures and binding affinities by mapping apo to holo states under a Riemannian geometric setting. It combines an ESMFold‑based protein prior with a harmonic ligand prior and a VD‑ODE sampling strategy, enabling efficient multi‑ligand docking and affinity estimation. Across PoseBusters, DockGen‑E, PDBBind, and CASP16 benchmarks, FlowDock achieves competitive or superior structure predictions and affinity correlations while maintaining favorable computational efficiency, supporting fast virtual screening of pharma‑relevant targets. The work provides public data, code, and pretrained models to accelerate development and benchmarking in structure‑guided drug discovery.

Abstract

Powerful generative AI models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts. In this work, we propose FlowDock, the first deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock outperforms single-sequence AlphaFold 3 and matches single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening. Source code, data, and pre-trained models are available at https://github.com/BioinfoMachineLearning/FlowDock.

FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction

TL;DR

FlowDock presents a novel flow‑matching framework that jointly predicts protein–ligand complex structures and binding affinities by mapping apo to holo states under a Riemannian geometric setting. It combines an ESMFold‑based protein prior with a harmonic ligand prior and a VD‑ODE sampling strategy, enabling efficient multi‑ligand docking and affinity estimation. Across PoseBusters, DockGen‑E, PDBBind, and CASP16 benchmarks, FlowDock achieves competitive or superior structure predictions and affinity correlations while maintaining favorable computational efficiency, supporting fast virtual screening of pharma‑relevant targets. The work provides public data, code, and pretrained models to accelerate development and benchmarking in structure‑guided drug discovery.

Abstract

Powerful generative AI models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts. In this work, we propose FlowDock, the first deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock outperforms single-sequence AlphaFold 3 and matches single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening. Source code, data, and pre-trained models are available at https://github.com/BioinfoMachineLearning/FlowDock.

Paper Structure

This paper contains 21 sections, 5 equations, 11 figures, 2 tables, 2 algorithms.

Figures (11)

  • Figure 1: An overview of biomolecular distribution modeling with FlowDock.
  • Figure 2: Protein-ligand docking success rates of each baseline method on the PoseBusters Benchmark set (n=308). Error bars: 3 runs.
  • Figure 3: Comparison of each flexible docking method's protein conformational changes made for the PoseBusters Benchmark set (n=308).
  • Figure 4: Protein-ligand docking success rates of each baseline method on the DockGen-E set (n=14). Error bars: 3 runs.
  • Figure 5: Comparison of each flexible docking method's protein conformational changes made for the DockGen-E set (n=122).
  • ...and 6 more figures