FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching
Joongwon Lee, Seonghwan Kim, Seokhyun Moon, Hyunwoo Kim, Woo Youn Kim
TL;DR
FragFM tackles the scalability bottleneck of atom-centric molecular graph generation by introducing a fragment-level discrete flow matching framework. A coarse-to-fine autoencoder preserves atom-level connectivity while operating on a fragment-level graph, and a stochastic fragment bag enables efficient exploration of a vast fragment space. The approach supports flexible conditioning via fragment bag reweighting and classifier guidance, enabling precise property-driven design, and introduces NPGen to benchmark natural product-like molecules. Empirical results show state-of-the-art or competitive performance on standard benchmarks, strong NP-focused metrics, and substantially faster sampling, underscoring FragFM's potential for large-scale, property-aware chemical space exploration.
Abstract
We introduce FragFM, a novel hierarchical framework via fragment-level discrete flow matching for efficient molecular graph generation. FragFM generates molecules at the fragment level, leveraging a coarse-to-fine autoencoder to reconstruct details at the atom level. Together with a stochastic fragment bag strategy to effectively handle an extensive fragment space, our framework enables more efficient and scalable molecular generation. We demonstrate that our fragment-based approach achieves better property control than the atom-based method and additional flexibility through conditioning the fragment bag. We also propose a Natural Product Generation benchmark (NPGen) to evaluate modern molecular graph generative models' ability to generate natural product-like molecules. Since natural products are biologically prevalidated and differ from typical drug-like molecules, our benchmark provides a more challenging yet meaningful evaluation relevant to drug discovery. We conduct a FragFM comparative study against various models on diverse molecular generation benchmarks, including NPGen, demonstrating superior performance. The results highlight the potential of fragment-based generative modeling for large-scale, property-aware molecular design, paving the way for more efficient exploration of chemical space.
