PropMolFlow: Property-Guided Molecule Generation with Geometry-Complete Flow Matching
Cheng Zeng, Jirui Jin, Connor Ambrose, George Karypis, Mark Transtrum, Ellad B. Tadmor, Richard G. Hennig, Adrian Roitberg, Stefano Martiniani, Mingjie Liu
TL;DR
PropMolFlow advances property-guided 3D molecule generation by marrying property embeddings with geometry-complete SE(3) flow matching. It jointly conditions atom types, charges, bond orders, and coordinates via a discrete CTMC and continuous flow framework, enabling fast, accurate conditional generation with diverse property embeddings, including a Gaussian expansion of properties. The method achieves competitive ID performance against diffusion baselines, accelerates sampling with as few as 100 steps, and is validated through DFT calculations and an out-of-distribution generation task that demonstrates novelty and extrapolation capabilities. These findings establish PropMolFlow as a scalable, property-aware generator for small molecules and a foundation for future extensions to larger datasets and multi-property conditioning, with potential integration into active learning and topology-aware design.
Abstract
Molecule generation is advancing rapidly in chemical discovery and drug design. Flow matching methods have recently set the state of the art (SOTA) in unconditional molecule generation, surpassing score-based diffusion models. However, diffusion models still lead in property-guided generation. In this work, we introduce PropMolFlow, an approach for property-guided molecule generation based on geometry-complete SE(3)-equivariant flow matching. Integrating five different property embedding methods with a Gaussian expansion of scalar properties, PropMolFlow achieves competitive performance against previous SOTA diffusion models in conditional molecule generation while maintaining high structural stability and validity. Additionally, it enables faster sampling speed with fewer time steps compared to baseline models. We highlight the importance of validating the properties of generated molecules through DFT calculations. Furthermore, we introduce a task to assess the model's ability to propose molecules with underrepresented property values, assessing its capacity for out-of-distribution generalization.
