Table of Contents
Fetching ...

The quest for the GRAph Level autoEncoder (GRALE)

Paul Krzakala, Gabriel Melo, Charlotte Laclau, Florence d'Alché-Buc, Rémi Flamary

TL;DR

GRALE tackles graph_level representation learning by encoding entire graphs into a shared Euclidean space and decoding them back to graphs of varying sizes. It replaces costly graph matching with a differentiable, learnable matching component and optimizes a graph_edit_distance-like objective via an Optimal Transport_loss, facilitated by an Evoformer-based encoder and a novel Evoformer Decoder. The framework demonstrates strong reconstruction quality and broad downstream applicability, including graph classification, regression, graph prediction, and graph matching, with impressive results on synthetic COLORING data and large molecular datasets like PUBCHEM. By enabling end_to_end pretraining on large graph corpora, GRALE offers a versatile foundation for graph_enabled tasks, while acknowledging computational complexity as a primary limitation and pointing to future work in scalable attention and accelerated differentiation of the Sinkhorn projection.

Abstract

Although graph-based learning has attracted a lot of attention, graph representation learning is still a challenging task whose resolution may impact key application fields such as chemistry or biology. To this end, we introduce GRALE, a novel graph autoencoder that encodes and decodes graphs of varying sizes into a shared embedding space. GRALE is trained using an Optimal Transport-inspired loss that compares the original and reconstructed graphs and leverages a differentiable node matching module, which is trained jointly with the encoder and decoder. The proposed attention-based architecture relies on Evoformer, the core component of AlphaFold, which we extend to support both graph encoding and decoding. We show, in numerical experiments on simulated and molecular data, that GRALE enables a highly general form of pre-training, applicable to a wide range of downstream tasks, from classification and regression to more complex tasks such as graph interpolation, editing, matching, and prediction.

The quest for the GRAph Level autoEncoder (GRALE)

TL;DR

GRALE tackles graph_level representation learning by encoding entire graphs into a shared Euclidean space and decoding them back to graphs of varying sizes. It replaces costly graph matching with a differentiable, learnable matching component and optimizes a graph_edit_distance-like objective via an Optimal Transport_loss, facilitated by an Evoformer-based encoder and a novel Evoformer Decoder. The framework demonstrates strong reconstruction quality and broad downstream applicability, including graph classification, regression, graph prediction, and graph matching, with impressive results on synthetic COLORING data and large molecular datasets like PUBCHEM. By enabling end_to_end pretraining on large graph corpora, GRALE offers a versatile foundation for graph_enabled tasks, while acknowledging computational complexity as a primary limitation and pointing to future work in scalable attention and accelerated differentiation of the Sinkhorn projection.

Abstract

Although graph-based learning has attracted a lot of attention, graph representation learning is still a challenging task whose resolution may impact key application fields such as chemistry or biology. To this end, we introduce GRALE, a novel graph autoencoder that encodes and decodes graphs of varying sizes into a shared embedding space. GRALE is trained using an Optimal Transport-inspired loss that compares the original and reconstructed graphs and leverages a differentiable node matching module, which is trained jointly with the encoder and decoder. The proposed attention-based architecture relies on Evoformer, the core component of AlphaFold, which we extend to support both graph encoding and decoding. We show, in numerical experiments on simulated and molecular data, that GRALE enables a highly general form of pre-training, applicable to a wide range of downstream tasks, from classification and regression to more complex tasks such as graph interpolation, editing, matching, and prediction.

Paper Structure

This paper contains 84 sections, 4 theorems, 62 equations, 16 figures, 12 tables, 2 algorithms.

Key Result

Proposition 1

If $\ell_C$ is a Bregman divergence, then $\mathcal{L}_\text{OT}(G,\hat{G},T)$ can be computed in $\mathcal{O}(N^3)$.

Figures (16)

  • Figure 1: The different classes of graph AutoEncoders. (Left) Node-level AutoEncoders such as kipf2016variational provide node level embeddings. (Center) Naive graph-level AutoEncoders such as simonovsky2018graphvae directly provide graph-level embeddings but rely on a graph matching solver to compute the training loss. (Right) Matching free approaches, such as proposed in this work and in winter2021permutation use a learnable module to provide the matching.
  • Figure 2: GRALE illustrated for an input of size $n=3$ and a maximum output graph size $N=4$.
  • Figure 3: G.I. accuracy vs $(K,D)$. Both axes are in log-scale so that the diagonals correspond to a given total dimension $d = K \times D$.
  • Figure 4: Interpolating graphs (from COLORING) using GRALE's latent space. On the left we interpolate between two graphs while on the right we compute the barycenter $\Bar{G}$ of the whole dataset.
  • Figure 5: Latent space edition of the size of a graph. Here, $\hat{n}$ is a one-hidden-layer MLP trained to predict graph size, and we set $\epsilon=0.01$. Steps that did not produce any visible change are omitted.
  • ...and 11 more figures

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4