Table of Contents
Fetching ...

Generalized Graph Transformer Variational Autoencoder

Siddhant Karki

TL;DR

The paper tackles link prediction on graphs by replacing traditional message passing with a Generalized Graph Transformer Variational Autoencoder (GGT-VAE) that uses Laplacian positional encodings and global self-attention to learn a probabilistic latent space. It demonstrates that transformer-based encoders can capture both local and global graph structure without neighborhood aggregation, achieving competitive ROC-AUC and AP on Planetoid datasets (Cora and Citeseer). Through qualitative attention maps and quantitative metrics like globality, the authors show how the model reasons over long-range relationships, with ablations confirming robust performance across reasonable hyperparameter ranges. The work highlights the potential of combining graph transformers with variational inference for scalable graph generation and link prediction, with future directions toward full graph generation, molecular design, and multimodal conditioning.

Abstract

Graph link prediction has long been a central problem in graph representation learning in both network analysis and generative modeling. Recent progress in deep learning has introduced increasingly sophisticated architectures for capturing relational dependencies within graph-structured data. In this work, we propose the Generalized Graph Transformer Variational Autoencoder (GGT-VAE). Our model integrates Generalized Graph Transformer Architecture with Variational Autoencoder framework for link prediction. Unlike prior GraphVAE, GCN, or GNN approaches, GGT-VAE leverages transformer style global self-attention mechanism along with laplacian positional encoding to model structural patterns across nodes into a latent space without relying on message passing. Experimental results on several benchmark datasets demonstrate that GGT-VAE consistently achieves above-baseline performance in terms of ROC-AUC and Average Precision. To the best of our knowledge, this is among the first studies to explore graph structure generation using a generalized graph transformer backbone in a variational framework.

Generalized Graph Transformer Variational Autoencoder

TL;DR

The paper tackles link prediction on graphs by replacing traditional message passing with a Generalized Graph Transformer Variational Autoencoder (GGT-VAE) that uses Laplacian positional encodings and global self-attention to learn a probabilistic latent space. It demonstrates that transformer-based encoders can capture both local and global graph structure without neighborhood aggregation, achieving competitive ROC-AUC and AP on Planetoid datasets (Cora and Citeseer). Through qualitative attention maps and quantitative metrics like globality, the authors show how the model reasons over long-range relationships, with ablations confirming robust performance across reasonable hyperparameter ranges. The work highlights the potential of combining graph transformers with variational inference for scalable graph generation and link prediction, with future directions toward full graph generation, molecular design, and multimodal conditioning.

Abstract

Graph link prediction has long been a central problem in graph representation learning in both network analysis and generative modeling. Recent progress in deep learning has introduced increasingly sophisticated architectures for capturing relational dependencies within graph-structured data. In this work, we propose the Generalized Graph Transformer Variational Autoencoder (GGT-VAE). Our model integrates Generalized Graph Transformer Architecture with Variational Autoencoder framework for link prediction. Unlike prior GraphVAE, GCN, or GNN approaches, GGT-VAE leverages transformer style global self-attention mechanism along with laplacian positional encoding to model structural patterns across nodes into a latent space without relying on message passing. Experimental results on several benchmark datasets demonstrate that GGT-VAE consistently achieves above-baseline performance in terms of ROC-AUC and Average Precision. To the best of our knowledge, this is among the first studies to explore graph structure generation using a generalized graph transformer backbone in a variational framework.

Paper Structure

This paper contains 28 sections, 17 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Architecture of the encoder–decoder framework used in our model. The encoder maps node and positional embeddings into a latent space, while the decoder reconstructs the adjacency matrix from the latent variables.
  • Figure 2: Attention maps for selected heads across Transformer layers. Lighter regions correspond to stronger attention weights between node pairs. Early layers focus on local neighborhoods, while deeper layers capture more global structural dependencies.
  • Figure 3: Cora dataset 2D t-SNE projection of latent embeddings from the encoder. Nodes belonging to similar communities are clustered together, indicating the model’s ability to capture meaningful relational structure in the latent space.
  • Figure 4: Attention vs. Graph Distance. Average attention weight for different shortest-path distances (SPD) on Cora. Unlike message-passing models that only attend to direct neighbors ($\mathrm{SPD}=1$), the Transformer assigns nonzero weight even to distant nodes ($\mathrm{SPD}>10$), showing that it captures both local and global structure without relying on adjacency-based message passing.
  • Figure 5: Normalized globality across layers and heads. The solid line shows the layer average; dashed lines show individual heads. Globality rises in early layers (broader attention) and decreases in the final layer (local refinement). Diverging head trends show that some heads specialize in global structure while others focus on local details.