Exploring Molecule Generation Using Latent Space Graph Diffusion
Prashanth Pombala, Gerrit Grossmann, Verena Wolf
TL;DR
This work investigates latent-space diffusion for molecular graph generation, contrasting Gaussian diffusion in latent space, heat-diffusion, and flow matching while comparing GNN and EGNN backbones. Molecules are encoded as atom point clouds in a latent space, then decoded into labeled molecular graphs with an edge-type predictor, enabling de novo generation via iterative restoration from noise. Experiments on QM9 reveal clear trade-offs: 2D latent spaces yield high validity with reasonable diversity, while higher dimensions increase uniqueness at the cost of validity and computation; heat-diffusion offers high validity with lower diversity, and flow matching provides a balanced performance. The results highlight that representation choices (latent vs input space) and diffusion strategy critically influence generation quality and computational efficiency, informing future design of diffusion-based molecular generators.
Abstract
Generating molecular graphs is a challenging task due to their discrete nature and the competitive objectives involved. Diffusion models have emerged as SOTA approaches in data generation across various modalities. For molecular graphs, graph neural networks (GNNs) as a diffusion backbone have achieved impressive results. Latent space diffusion, where diffusion occurs in a low-dimensional space via an autoencoder, has demonstrated computational efficiency. However, the literature on latent space diffusion for molecular graphs is scarce, and no commonly accepted best practices exist. In this work, we explore different approaches and hyperparameters, contrasting generative flow models (denoising diffusion, flow matching, heat dissipation) and architectures (GNNs and E(3)-equivariant GNNs). Our experiments reveal a high sensitivity to the choice of approach and design decisions. Code is made available at github.com/Prashanth-Pombala/Molecule-Generation-using-Latent-Space-Graph-Diffusion.
