Auto-encoding Molecules: Graph-Matching Capabilities Matter
Magnus Cunow, Gerrit Großmann
TL;DR
This work tackles the challenge of autoencoding molecular graphs by focusing on permutation-invariant reconstruction and decoding from latent space. It introduces a transformer-based one-shot graph decoder and a differentiable graph-matching loss to provide strong, permutation-aware gradient signals, enabling improved de novo molecule generation. Empirical results on QM9 show that higher-precision graph matching yields faster training convergence and better generation metrics (validity, uniqueness, novelty), with optimal matching consistently outperforming near-optimal alternatives. The study also discusses scalability constraints due to permutation explosion and outlines directions for improving data efficiency and applying graph-matching ideas to larger graphs. Overall, the paper demonstrates that matching quality is a key driver of both learning dynamics and generation quality in graph VAEs for molecules.
Abstract
Autoencoders are effective deep learning models that can function as generative models and learn latent representations for downstream tasks. The use of graph autoencoders - with both encoder and decoder implemented as message passing networks - is intriguing due to their ability to generate permutation-invariant graph representations. However, this approach faces difficulties because decoding a graph structure from a single vector is challenging, and comparing input and output graphs requires an effective permutation-invariant similarity measure. As a result, many studies rely on approximate methods. In this work, we explore the effect of graph matching precision on the training behavior and generation capabilities of a Variational Autoencoder (VAE). Our contribution is two-fold: (1) we propose a transformer-based message passing graph decoder as an alternative to a graph neural network decoder, that is more robust and expressive by leveraging global attention mechanisms. (2) We show that the precision of graph matching has significant impact on training behavior and is essential for effective de novo (molecular) graph generation. Code is available at https://github.com/mcunow/graph-matching
