Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure
Ian Dunn, David Ryan Koes
TL;DR
Diffusion models for structure-based drug design are hampered by the scalability of graph-based representations and slow inference when using all-atom detail. The authors introduce a learnable latent pocket representation via a GNN, producing a compact keypoint-based encoding $z^{(KP)}$ that conditions diffusion for de novo ligand design, and they train this end-to-end with a diffusion model using an optimal transport loss to align keypoints with interface regions. The approach achieves about a $3\times$ decrease in inference time while maintaining ligand quality comparable to all-atom baselines, with GVP-based keypoints closely matching all-atom performance and EGNN keypoints matching $C_\alpha$ baselines. These results demonstrate scalable, high-quality diffusion-driven ligand generation and provide guidance on when to favor GVP over EGNN architectures for molecular structure representations.
Abstract
Diffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible. However, such coarse-grained representations discard essential information for modeling molecular interactions and impair the quality of generated structures. In this work, we present a novel GNN-based architecture for learning latent representations of molecular structure. When trained end-to-end with a diffusion model for de novo ligand design, our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.
