Table of Contents
Fetching ...

Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure

Ian Dunn, David Ryan Koes

TL;DR

Diffusion models for structure-based drug design are hampered by the scalability of graph-based representations and slow inference when using all-atom detail. The authors introduce a learnable latent pocket representation via a GNN, producing a compact keypoint-based encoding $z^{(KP)}$ that conditions diffusion for de novo ligand design, and they train this end-to-end with a diffusion model using an optimal transport loss to align keypoints with interface regions. The approach achieves about a $3\times$ decrease in inference time while maintaining ligand quality comparable to all-atom baselines, with GVP-based keypoints closely matching all-atom performance and EGNN keypoints matching $C_\alpha$ baselines. These results demonstrate scalable, high-quality diffusion-driven ligand generation and provide guidance on when to favor GVP over EGNN architectures for molecular structure representations.

Abstract

Diffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible. However, such coarse-grained representations discard essential information for modeling molecular interactions and impair the quality of generated structures. In this work, we present a novel GNN-based architecture for learning latent representations of molecular structure. When trained end-to-end with a diffusion model for de novo ligand design, our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.

Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure

TL;DR

Diffusion models for structure-based drug design are hampered by the scalability of graph-based representations and slow inference when using all-atom detail. The authors introduce a learnable latent pocket representation via a GNN, producing a compact keypoint-based encoding that conditions diffusion for de novo ligand design, and they train this end-to-end with a diffusion model using an optimal transport loss to align keypoints with interface regions. The approach achieves about a decrease in inference time while maintaining ligand quality comparable to all-atom baselines, with GVP-based keypoints closely matching all-atom performance and EGNN keypoints matching baselines. These results demonstrate scalable, high-quality diffusion-driven ligand generation and provide guidance on when to favor GVP over EGNN architectures for molecular structure representations.

Abstract

Diffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible. However, such coarse-grained representations discard essential information for modeling molecular interactions and impair the quality of generated structures. In this work, we present a novel GNN-based architecture for learning latent representations of molecular structure. When trained end-to-end with a diffusion model for de novo ligand design, our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.
Paper Structure (24 sections, 14 equations, 3 figures, 2 tables)

This paper contains 24 sections, 14 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Message passing is performed between receptor nodes. Learned receptor embeddings are used to place keypoints inside the binding pocket. Keypoints extract local features of the binding pocket. Keypoints are then used to condition the ligand generation process.
  • Figure 2: Left, Middle: CDFs of ligand RMSD from force-field minimization and Vina score. Right: Sampling time per molecule averaged over the same ten binding pockets for each model.
  • Figure 3: Left, Middle: CDFs of ligand RMSD from force-field minimization and Vina score. Right: Sampling time per molecule averaged over the same ten binding pockets for each model.