Table of Contents
Fetching ...

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders

Martin Simonovsky, Nikos Komodakis

TL;DR

GraphVAE tackles generation of graphs by formulating a variational autoencoder whose decoder outputs a probabilistic graph on a fixed maximum size. It relies on a graph-matching-based reconstruction objective and a differentiable training loop that uses a discrete assignment via Hungarian algorithm to align generated graphs with ground truth. Evaluations on QM9 and ZINC show the method can generate chemically valid small molecules and highlight the impact of graph matching on performance, while also exposing scalability challenges for larger graphs. Together, this work demonstrates a promising path toward powerful, end-to-end graph decoders and outlines concrete avenues for improvement in priors, conditioning, and larger-graph generation.

Abstract

Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by having a decoder output a probabilistic fully-connected graph of a predefined maximum size directly at once. Our method is formulated as a variational autoencoder. We evaluate on the challenging task of molecule generation.

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders

TL;DR

GraphVAE tackles generation of graphs by formulating a variational autoencoder whose decoder outputs a probabilistic graph on a fixed maximum size. It relies on a graph-matching-based reconstruction objective and a differentiable training loop that uses a discrete assignment via Hungarian algorithm to align generated graphs with ground truth. Evaluations on QM9 and ZINC show the method can generate chemically valid small molecules and highlight the impact of graph matching on performance, while also exposing scalability challenges for larger graphs. Together, this work demonstrates a promising path toward powerful, end-to-end graph decoders and outlines concrete avenues for improvement in priors, conditioning, and larger-graph generation.

Abstract

Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by having a decoder output a probabilistic fully-connected graph of a predefined maximum size directly at once. Our method is formulated as a variational autoencoder. We evaluate on the challenging task of molecule generation.

Paper Structure

This paper contains 15 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of the proposed variational graph autoencoder. Starting from a discrete attributed graph $G=(A,E,F)$ on $n$ nodes (e.g. a representation of propylene oxide), stochastic graph encoder ${q_{\phi}(\mathbf{z}|G)}$ embeds the graph into continuous representation $\mathbf{z}$. Given a point in the latent space, our novel graph decoder ${p_{\theta}(G|\mathbf{z})}$ outputs a probabilistic fully-connected graph $\widetilde{G}=(\widetilde{A},\widetilde{E},\widetilde{F})$ on predefined $k \geq n$ nodes, from which discrete samples may be drawn. The process can be conditioned on label $\textbf{y}$ for controlled sampling at test time. Reconstruction ability of the autoencoder is facilitated by approximate graph matching for aligning $G$ with $\widetilde{G}$.
  • Figure 2: Decodings of latent space points of a conditional model sampled over a random 2D plane in $\mathbf{z}$-space of $c=40$ (within 5 units from center of coordinates). Left: Samples conditioned on 7x Carbon, 1x Nitrogen, 1x Oxygen (12% QM9). Right: Samples conditioned on 5x Carbon, 1x Nitrogen, 3x Oxygen (2.6% QM9). Color legend as in Figure \ref{['fig:interp']}.
  • Figure 3: Linear interpolation between row-wise pairs of randomly chosen molecules in $\mathbf{z}$-space of $c=40$ in a conditional model. Color legend: encoder inputs (green), chemically invalid graphs (red), valid graphs with wrong label (blue), valid and correct (white).