Junction Tree Variational Autoencoder for Molecular Graph Generation
Wengong Jin, Regina Barzilay, Tommi Jaakkola
TL;DR
The paper addresses automated molecular design by learning continuous embeddings and directly generating molecular graphs. It introduces the Junction Tree Variational Autoencoder (JT-VAE), which first builds a junction-tree scaffold of valid subgraphs and then assembles them into a full molecule, enforcing chemical validity throughout generation. The model jointly learns a two-part latent space, $oldsymbol{z}=[oldsymbol{z}_{oldsymbol{ au}}, oldsymbol{z}_G]$, corresponding to the tree structure and the fine-grained graph, encoded via a tree and graph encoder and decoded through a tree and graph decoder. Empirically, JT-VAE outperforms SMILES-based baselines on generation and optimization tasks, achieving $100 ext{%}$ prior validity, strong molecule reconstruction, and superior results in Bayesian optimization and constrained optimization, highlighting its practical impact for scalable, valid molecular graph generation.
Abstract
We seek to automate the design of molecules based on specific chemical properties. In computational terms, this task involves continuous embedding and generation of molecular graphs. Our primary contribution is the direct realization of molecular graphs, a task previously approached by generating linear SMILES strings instead of graphs. Our junction tree variational autoencoder generates molecular graphs in two phases, by first generating a tree-structured scaffold over chemical substructures, and then combining them into a molecule with a graph message passing network. This approach allows us to incrementally expand molecules while maintaining chemical validity at every step. We evaluate our model on multiple tasks ranging from molecular generation to optimization. Across these tasks, our model outperforms previous state-of-the-art baselines by a significant margin.
