Table of Contents
Fetching ...

Prior-Guided Flow Matching for Target-Aware Molecule Design with Learnable Atom Number

Jingyuan Zhou, Hao Qian, Shikui Tu, Lei Xu

TL;DR

PAFlow is proposed, a novel target-aware molecular generation model featuring prior interaction guidance and a learnable atom number predictor that achieves a new state-of-the-art in binding affinity and simultaneously maintains favorable molecular properties.

Abstract

Structure-based drug design (SBDD), aiming to generate 3D molecules with high binding affinity toward target proteins, is a vital approach in novel drug discovery. Although recent generative models have shown great potential, they suffer from unstable probability dynamics and mismatch between generated molecule size and the protein pockets geometry, resulting in inconsistent quality and off-target effects. We propose PAFlow, a novel target-aware molecular generation model featuring prior interaction guidance and a learnable atom number predictor. PAFlow adopts the efficient flow matching framework to model the generation process and constructs a new form of conditional flow matching for discrete atom types. A protein-ligand interaction predictor is incorporated to guide the vector field toward higher-affinity regions during generation, while an atom number predictor based on protein pocket information is designed to better align generated molecule size with target geometry. Extensive experiments on the CrossDocked2020 benchmark show that PAFlow achieves a new state-of-the-art in binding affinity (up to -8.31 Avg. Vina Score), simultaneously maintains favorable molecular properties.

Prior-Guided Flow Matching for Target-Aware Molecule Design with Learnable Atom Number

TL;DR

PAFlow is proposed, a novel target-aware molecular generation model featuring prior interaction guidance and a learnable atom number predictor that achieves a new state-of-the-art in binding affinity and simultaneously maintains favorable molecular properties.

Abstract

Structure-based drug design (SBDD), aiming to generate 3D molecules with high binding affinity toward target proteins, is a vital approach in novel drug discovery. Although recent generative models have shown great potential, they suffer from unstable probability dynamics and mismatch between generated molecule size and the protein pockets geometry, resulting in inconsistent quality and off-target effects. We propose PAFlow, a novel target-aware molecular generation model featuring prior interaction guidance and a learnable atom number predictor. PAFlow adopts the efficient flow matching framework to model the generation process and constructs a new form of conditional flow matching for discrete atom types. A protein-ligand interaction predictor is incorporated to guide the vector field toward higher-affinity regions during generation, while an atom number predictor based on protein pocket information is designed to better align generated molecule size with target geometry. Extensive experiments on the CrossDocked2020 benchmark show that PAFlow achieves a new state-of-the-art in binding affinity (up to -8.31 Avg. Vina Score), simultaneously maintains favorable molecular properties.

Paper Structure

This paper contains 49 sections, 31 equations, 11 figures, 11 tables, 2 algorithms.

Figures (11)

  • Figure 1: Overview of the PAFlow generation process. The atom predictor first estimates the number of atoms in the generated molecule based on the target protein information, which is then used to initialize the molecule. SE(3)-EGNN is applied to predict vector fields for atomic coordinates and atom types, while a protein-ligand interaction predictor provides prior binding guidance for the coordinate vector field. The final ligand molecule with high binding affinity is obtained through iterative updates. For simplicity, $\bar{\alpha}_{1-t}$ is denoted as $\bar{\alpha}$.
  • Figure 2: Median Vina energy for different generated molecules (ALiDiff, MolCRAFT, PAFlow) across 100 testing binding targets. The proteins are sorted by the median Vina energy of molecules generated from PAFlow.
  • Figure 3: Visualizations of reference molecules and molecules generated by ALiDiff and PAFlow for protein pockets (4YHJ, 3DZH , 2Z3H and 2JJG). Vina Score, QED and SA are reported below.
  • Figure 4: Average time required by different methods to generate 100 molecules for a target protein, with shorter times indicating higher sampling efficiency.
  • Figure 5: Atomic coordinate trajectories from $t=0$ to $t=1$ under different sampling strategies with the same probability paths. (a) shows the generation trajectory of TargetDiff, while (b) presents that of PAFlow w/o PA.
  • ...and 6 more figures