Sub-graph Based Diffusion Model for Link Prediction

Hang Li; Wei Jin; Geri Skenderi; Harry Shomer; Wenzhuo Tang; Wenqi Fan; Jiliang Tang

Sub-graph Based Diffusion Model for Link Prediction

Hang Li, Wei Jin, Geri Skenderi, Harry Shomer, Wenzhuo Tang, Wenqi Fan, Jiliang Tang

TL;DR

This work introduces SGDiff, a diffusion-based framework for link prediction on sub-graphs that factorizes $p(y|\mathbf{A},\mathbf{X})$ into structure and feature components via Bayes. It deploys two diffusion streams—discrete structure diffusion for adjacency and orbit-augmented sub-graph features, and continuous Gaussian diffusion for node features—with a transformer-based denoiser and ELBO-guided training. A learnable fusion mechanism (with $\{\eta_1,\eta_2,\delta\}$) combines the diffusion outputs to estimate $p(y|\mathbf{A},\mathbf{X})$, enabling cross-dataset transfer and robustness to adversarial graph perturbations. Empirical results across six datasets show SGDiff often outperforms baselines in cross-data transfer, remains resilient with limited training data, and exhibits robustness advantages against adversarial attacks. This points to the potential of diffusion-based generative approaches in producing transferable, robust, and data-efficient link prediction, with implications for broader graph tasks through shared structure diffusion and dataset-tailored feature diffusion.

Abstract

Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities in both synthesis and maximizing the data likelihood. These models work by traversing a forward Markov Chain where data is perturbed, followed by a reverse process where a neural network learns to undo the perturbations and recover the original data. There have been increasing efforts exploring the applications of DDPMs in the graph domain. However, most of them have focused on the generative perspective. In this paper, we aim to build a novel generative model for link prediction. In particular, we treat link prediction between a pair of nodes as a conditional likelihood estimation of its enclosing sub-graph. With a dedicated design to decompose the likelihood estimation process via the Bayesian formula, we are able to separate the estimation of sub-graph structure and its node features. Such designs allow our model to simultaneously enjoy the advantages of inductive learning and the strong generalization capability. Remarkably, comprehensive experiments across various datasets validate that our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.

Sub-graph Based Diffusion Model for Link Prediction

TL;DR

This work introduces SGDiff, a diffusion-based framework for link prediction on sub-graphs that factorizes

into structure and feature components via Bayes. It deploys two diffusion streams—discrete structure diffusion for adjacency and orbit-augmented sub-graph features, and continuous Gaussian diffusion for node features—with a transformer-based denoiser and ELBO-guided training. A learnable fusion mechanism (with

) combines the diffusion outputs to estimate

, enabling cross-dataset transfer and robustness to adversarial graph perturbations. Empirical results across six datasets show SGDiff often outperforms baselines in cross-data transfer, remains resilient with limited training data, and exhibits robustness advantages against adversarial attacks. This points to the potential of diffusion-based generative approaches in producing transferable, robust, and data-efficient link prediction, with implications for broader graph tasks through shared structure diffusion and dataset-tailored feature diffusion.

Abstract

Paper Structure (21 sections, 12 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 21 sections, 12 equations, 3 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Sub-graph Based Link Prediction
Likelihood Estimation of Diffusion Models
Method
Notations
Design Overview
Structure Diffusion Model
Node Diffusion Model
Connection Probability Estimation
Experiments
General Experimental Settings
Performance on Cross-data Transferability
Performance with Train Size Constraint
Performance in Terms of Robustness
...and 6 more sections

Figures (3)

Figure 1: An overview of our proposed framework. $\mathbf{Q}^t$ and $q$ are diffusion kernels for structure and feature diffusion models, respectively. The calculation of log-likelihood scores $\log P_{\theta}(\mathbf{A}|y)$ and $\log P_{\phi}(\mathbf{X}|\mathbf{A},y)$ is based on fitted denoising models, $p_{\phi}(\mathbf{A}^{(0)}|\mathbf{A}^{(t)},y)$ and $p_{\epsilon}(\mathbf{X}^{(t-1)}|\mathbf{X}^{(t-)},\mathbf{A}^{(0)},y)$, respectively.
Figure 2: Model Performance on Cora / Citeseer / Pubmed / Router / NS / USAir datasets under the limited (1%) training set scenario.
Figure 3: Models' robustness against the DICE attack on Cora / Citeseer / Pubmed datasets.

Sub-graph Based Diffusion Model for Link Prediction

TL;DR

Abstract

Sub-graph Based Diffusion Model for Link Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)