Differentiable Scaffolding Tree for Molecular Optimization
Tianfan Fu, Wenhao Gao, Cao Xiao, Jacob Yasonik, Connor W. Coley, Jimeng Sun
TL;DR
This work tackles molecular optimization by introducing a differentiable scaffolding tree (DST) that renders substructure-level representations locally differentiable, enabling gradient-based optimization on discrete molecular graphs. A graph neural network serves as a surrogate oracle to guide DST updates, while a determinantal point process (DPP) ensures diversity among generated molecules. DST achieves sample-efficient optimization with improved interpretability, reducing online oracle usage compared to baselines. Experiments on ZINC 250K across multiple objectives show DST often outperforms state-of-the-art baselines in both objective scores and diversity, underscoring its potential for drug discovery and materials design.
Abstract
The structural design of functional molecules, also called molecular optimization, is an essential chemical science and engineering task with important applications, such as drug discovery. Deep generative models and combinatorial optimization methods achieve initial success but still struggle with directly modeling discrete chemical structures and often heavily rely on brute-force enumeration. The challenge comes from the discrete and non-differentiable nature of molecule structures. To address this, we propose differentiable scaffolding tree (DST) that utilizes a learned knowledge network to convert discrete chemical structures to locally differentiable ones. DST enables a gradient-based optimization on a chemical graph structure by back-propagating the derivatives from the target properties through a graph neural network (GNN). Our empirical studies show the gradient-based molecular optimizations are both effective and sample efficient. Furthermore, the learned graph parameters can also provide an explanation that helps domain experts understand the model output.
