Table of Contents
Fetching ...

Deep Generative Models for Subgraph Prediction

Erfaneh Mahmoudzadeh, Parmis Naddaf, Kiarash Zahirnia, Oliver Schulte

TL;DR

This work introduces subgraph queries as a flexible, probabilistic prediction task on graphs and presents VGAE+, an augmented variational graph auto-encoder that jointly models links, node features, and node labels. The model is trained inductively with a variational ELBO objective and hyperparameters tuned by Bayesian optimization, enabling zero-shot answers to diverse subgraph queries without retraining. Two inference modes—deterministic posterior means and Monte Carlo sampling—allow estimating conditional subgraph probabilities from a trained GGM, demonstrated across six benchmark datasets. Empirical results show VGAE+ achieving strong joint predictions for subgraph queries, often outperforming baselines that predict components independently, and highlighting the value of co-training on structure and attributes. Limitations include the use of homogeneous graphs; future work points to richer GGMs, weighted/dynamic graphs, and knowledge-graph-oriented extensions to broaden applicability and expressive power.

Abstract

Graph Neural Networks (GNNs) are important across different domains, such as social network analysis and recommendation systems, due to their ability to model complex relational data. This paper introduces subgraph queries as a new task for deep graph learning. Unlike traditional graph prediction tasks that focus on individual components like link prediction or node classification, subgraph queries jointly predict the components of a target subgraph based on evidence that is represented by an observed subgraph. For instance, a subgraph query can predict a set of target links and/or node labels. To answer subgraph queries, we utilize a probabilistic deep Graph Generative Model. Specifically, we inductively train a Variational Graph Auto-Encoder (VGAE) model, augmented to represent a joint distribution over links, node features and labels. Bayesian optimization is used to tune a weighting for the relative importance of links, node features and labels in a specific domain. We describe a deterministic and a sampling-based inference method for estimating subgraph probabilities from the VGAE generative graph distribution, without retraining, in zero-shot fashion. For evaluation, we apply the inference methods on a range of subgraph queries on six benchmark datasets. We find that inference from a model achieves superior predictive performance, surpassing independent prediction baselines with improvements in AUC scores ranging from 0.06 to 0.2 points, depending on the dataset.

Deep Generative Models for Subgraph Prediction

TL;DR

This work introduces subgraph queries as a flexible, probabilistic prediction task on graphs and presents VGAE+, an augmented variational graph auto-encoder that jointly models links, node features, and node labels. The model is trained inductively with a variational ELBO objective and hyperparameters tuned by Bayesian optimization, enabling zero-shot answers to diverse subgraph queries without retraining. Two inference modes—deterministic posterior means and Monte Carlo sampling—allow estimating conditional subgraph probabilities from a trained GGM, demonstrated across six benchmark datasets. Empirical results show VGAE+ achieving strong joint predictions for subgraph queries, often outperforming baselines that predict components independently, and highlighting the value of co-training on structure and attributes. Limitations include the use of homogeneous graphs; future work points to richer GGMs, weighted/dynamic graphs, and knowledge-graph-oriented extensions to broaden applicability and expressive power.

Abstract

Graph Neural Networks (GNNs) are important across different domains, such as social network analysis and recommendation systems, due to their ability to model complex relational data. This paper introduces subgraph queries as a new task for deep graph learning. Unlike traditional graph prediction tasks that focus on individual components like link prediction or node classification, subgraph queries jointly predict the components of a target subgraph based on evidence that is represented by an observed subgraph. For instance, a subgraph query can predict a set of target links and/or node labels. To answer subgraph queries, we utilize a probabilistic deep Graph Generative Model. Specifically, we inductively train a Variational Graph Auto-Encoder (VGAE) model, augmented to represent a joint distribution over links, node features and labels. Bayesian optimization is used to tune a weighting for the relative importance of links, node features and labels in a specific domain. We describe a deterministic and a sampling-based inference method for estimating subgraph probabilities from the VGAE generative graph distribution, without retraining, in zero-shot fashion. For evaluation, we apply the inference methods on a range of subgraph queries on six benchmark datasets. We find that inference from a model achieves superior predictive performance, surpassing independent prediction baselines with improvements in AUC scores ranging from 0.06 to 0.2 points, depending on the dataset.
Paper Structure (59 sections, 12 equations, 10 figures, 4 tables)

This paper contains 59 sections, 12 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Example of joint link prediction and node classification: predicting potential actors for Christopher Nolan's new movie and its genre. Target links are dashed lines marked with "?". Solid links are specified as evidence.
  • Figure 2: After training a single GGM, the approximate inference methods described in this paper can answer a user query.
  • Figure 3: Left: Input graph with partition of nodes. Node labels are green and blue. Right: Neighborhood query: Target labels and links are colored red. Black dashed links are unspecified links in evidence.
  • Figure 4: Encoder-Decoder Training Architecture
  • Figure 5: Hyper-parameters tuning process
  • ...and 5 more figures