Table of Contents
Fetching ...

Auto-encoding Molecules: Graph-Matching Capabilities Matter

Magnus Cunow, Gerrit Großmann

TL;DR

This work tackles the challenge of autoencoding molecular graphs by focusing on permutation-invariant reconstruction and decoding from latent space. It introduces a transformer-based one-shot graph decoder and a differentiable graph-matching loss to provide strong, permutation-aware gradient signals, enabling improved de novo molecule generation. Empirical results on QM9 show that higher-precision graph matching yields faster training convergence and better generation metrics (validity, uniqueness, novelty), with optimal matching consistently outperforming near-optimal alternatives. The study also discusses scalability constraints due to permutation explosion and outlines directions for improving data efficiency and applying graph-matching ideas to larger graphs. Overall, the paper demonstrates that matching quality is a key driver of both learning dynamics and generation quality in graph VAEs for molecules.

Abstract

Autoencoders are effective deep learning models that can function as generative models and learn latent representations for downstream tasks. The use of graph autoencoders - with both encoder and decoder implemented as message passing networks - is intriguing due to their ability to generate permutation-invariant graph representations. However, this approach faces difficulties because decoding a graph structure from a single vector is challenging, and comparing input and output graphs requires an effective permutation-invariant similarity measure. As a result, many studies rely on approximate methods. In this work, we explore the effect of graph matching precision on the training behavior and generation capabilities of a Variational Autoencoder (VAE). Our contribution is two-fold: (1) we propose a transformer-based message passing graph decoder as an alternative to a graph neural network decoder, that is more robust and expressive by leveraging global attention mechanisms. (2) We show that the precision of graph matching has significant impact on training behavior and is essential for effective de novo (molecular) graph generation. Code is available at https://github.com/mcunow/graph-matching

Auto-encoding Molecules: Graph-Matching Capabilities Matter

TL;DR

This work tackles the challenge of autoencoding molecular graphs by focusing on permutation-invariant reconstruction and decoding from latent space. It introduces a transformer-based one-shot graph decoder and a differentiable graph-matching loss to provide strong, permutation-aware gradient signals, enabling improved de novo molecule generation. Empirical results on QM9 show that higher-precision graph matching yields faster training convergence and better generation metrics (validity, uniqueness, novelty), with optimal matching consistently outperforming near-optimal alternatives. The study also discusses scalability constraints due to permutation explosion and outlines directions for improving data efficiency and applying graph-matching ideas to larger graphs. Overall, the paper demonstrates that matching quality is a key driver of both learning dynamics and generation quality in graph VAEs for molecules.

Abstract

Autoencoders are effective deep learning models that can function as generative models and learn latent representations for downstream tasks. The use of graph autoencoders - with both encoder and decoder implemented as message passing networks - is intriguing due to their ability to generate permutation-invariant graph representations. However, this approach faces difficulties because decoding a graph structure from a single vector is challenging, and comparing input and output graphs requires an effective permutation-invariant similarity measure. As a result, many studies rely on approximate methods. In this work, we explore the effect of graph matching precision on the training behavior and generation capabilities of a Variational Autoencoder (VAE). Our contribution is two-fold: (1) we propose a transformer-based message passing graph decoder as an alternative to a graph neural network decoder, that is more robust and expressive by leveraging global attention mechanisms. (2) We show that the precision of graph matching has significant impact on training behavior and is essential for effective de novo (molecular) graph generation. Code is available at https://github.com/mcunow/graph-matching

Paper Structure

This paper contains 22 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Schematic overview of our molecular representation. (a) Example molecule: methyl isocyanate. (b) Molecules are represented as fully-connected graph, where atoms, bonds and the absence of bonds between atoms are represented as nodes. Dotted lines indicate the introduction of edges for non-existing molecular bonds. (c) Our internal feature representation matrix $X$, where the first column represents the type of the node (i.e., atom or bond). The remaining columns encode either the atom type (C, H, O, N, F) or the bond type (no bond, single, aromatic, double, triple). Source: grossmanndiscriminator.
  • Figure 2: Permutation-invariant graph matching loss for graph autoencoders. The approach consists of two steps: (1) Compute the best alignment between the input graph $G$ and the reconstructed probabilistic graph $\hat{G}$ by finding the optimal permutation with respect to a given (not necessarily differentiable) distance measure $d$. (2) Use the optimal permutation from the first step to compute the reconstruction loss $L_{rec}$ and backpropagate gradients, as indicated by the purple arrow.
  • Figure 3: Comparison of training behavior and generation capabilities, evaluated by graph matching quality. A baseline is also included, using no matching at all, a simple graph statistics loss and a GNN-based loss.