Multi-granularity Score-based Generative Framework Enables Efficient Inverse Design of Complex Organics
Zijun Chen, Yu Wang, Liuzhenghao Lv, Hao Li, Zongying Lin, Li Yuan, Yonghong Tian
TL;DR
OrgMol-Design tackles inverse design of complex organics by combining a fragment-prior score-based generator for coarse-grained scaffolds with a chemistry-informed fine-grained bond scorer. It models generation over a fragment graph \\mathbf{G}^{\\mathcal{F}}=(\\mathbf{F},\\mathbf{C})$ using two score networks \\boldsymbol{\\epsilon}_{\\theta,t}$ and \\boldsymbol{\\epsilon}_{\\phi,t}$ to estimate node and topology scores across time steps \\in [0,T], and then refines assembled structures via a bond-scoring module that enforces chemical validity. A learned fragment vocabulary built with a Byte Pair Encoding–style bottom-up merge reduces atomic complexity and preserves essential substructures. Across four challenging benchmarks (OPVs, reaction substrates, organic emitters, and protein ligands), OrgMol-Design achieves state-of-the-art results and substantial efficiency gains over atom-level diffusion baselines, underscoring the value of fragment priors for scalable, high-quality inverse design of complex organics.
Abstract
Efficiently retrieving an enormous chemical library to design targeted molecules is crucial for accelerating drug discovery, organic chemistry, and optoelectronic materials. Despite the emergence of generative models to produce novel drug-like molecules, in a more realistic scenario, the complexity of functional groups (e.g., pyrene, acenaphthylene, and bridged-ring systems) and extensive molecular scaffolds remain challenging obstacles for the generation of complex organics. Traditionally, the former demands an extra learning process, e.g., molecular pre-training, and the latter requires expensive computational resources. To address these challenges, we propose OrgMol-Design, a multi-granularity framework for efficiently designing complex organics. Our OrgMol-Design is composed of a score-based generative model via fragment prior for diverse coarse-grained scaffold generation and a chemical-rule-aware scoring model for fine-grained molecular structure design, circumventing the difficulty of intricate substructure learning without losing connection details among fragments. Our approach achieves state-of-the-art performance in four real-world and more challenging benchmarks covering broader scientific domains, outperforming advanced molecule generative models. Additionally, it delivers a substantial speedup and graphics memory reduction compared to diffusion-based graph models. Our results also demonstrate the importance of leveraging fragment prior for a generalized molecule inverse design model.
