Table of Contents
Fetching ...

Diffusion-based Negative Sampling on Graphs for Link Prediction

Trung-Kien Nguyen, Yuan Fang

TL;DR

This work tackles the challenge of robust negative sampling for graph link prediction by introducing DMNS, a conditional diffusion-based framework that generates multi-level negative nodes in the latent space conditioned on the query node. By sampling negatives at different diffusion time steps, DMNS provides controllable hardness, enabling more effective contrastive learning while reducing reliance on rigid graph-based negatives. The authors prove a Sub-linear Positivity Principle for the generated negatives and demonstrate strong empirical gains over a broad set of baselines across multiple datasets and encoders. The approach is scalable and adaptable to different GNN backbones, highlighting its potential to improve link prediction in real-world graphs and possibly extend to other graph learning tasks.

Abstract

Link prediction is a fundamental task for graph analysis with important applications on the Web, such as social network analysis and recommendation systems, etc. Modern graph link prediction methods often employ a contrastive approach to learn robust node representations, where negative sampling is pivotal. Typical negative sampling methods aim to retrieve hard examples based on either predefined heuristics or automatic adversarial approaches, which might be inflexible or difficult to control. Furthermore, in the context of link prediction, most previous methods sample negative nodes from existing substructures of the graph, missing out on potentially more optimal samples in the latent space. To address these issues, we investigate a novel strategy of multi-level negative sampling that enables negative node generation with flexible and controllable ``hardness'' levels from the latent space. Our method, called Conditional Diffusion-based Multi-level Negative Sampling (DMNS), leverages the Markov chain property of diffusion models to generate negative nodes in multiple levels of variable hardness and reconcile them for effective graph link prediction. We further demonstrate that DMNS follows the sub-linear positivity principle for robust negative sampling. Extensive experiments on several benchmark datasets demonstrate the effectiveness of DMNS.

Diffusion-based Negative Sampling on Graphs for Link Prediction

TL;DR

This work tackles the challenge of robust negative sampling for graph link prediction by introducing DMNS, a conditional diffusion-based framework that generates multi-level negative nodes in the latent space conditioned on the query node. By sampling negatives at different diffusion time steps, DMNS provides controllable hardness, enabling more effective contrastive learning while reducing reliance on rigid graph-based negatives. The authors prove a Sub-linear Positivity Principle for the generated negatives and demonstrate strong empirical gains over a broad set of baselines across multiple datasets and encoders. The approach is scalable and adaptable to different GNN backbones, highlighting its potential to improve link prediction in real-world graphs and possibly extend to other graph learning tasks.

Abstract

Link prediction is a fundamental task for graph analysis with important applications on the Web, such as social network analysis and recommendation systems, etc. Modern graph link prediction methods often employ a contrastive approach to learn robust node representations, where negative sampling is pivotal. Typical negative sampling methods aim to retrieve hard examples based on either predefined heuristics or automatic adversarial approaches, which might be inflexible or difficult to control. Furthermore, in the context of link prediction, most previous methods sample negative nodes from existing substructures of the graph, missing out on potentially more optimal samples in the latent space. To address these issues, we investigate a novel strategy of multi-level negative sampling that enables negative node generation with flexible and controllable ``hardness'' levels from the latent space. Our method, called Conditional Diffusion-based Multi-level Negative Sampling (DMNS), leverages the Markov chain property of diffusion models to generate negative nodes in multiple levels of variable hardness and reconcile them for effective graph link prediction. We further demonstrate that DMNS follows the sub-linear positivity principle for robust negative sampling. Extensive experiments on several benchmark datasets demonstrate the effectiveness of DMNS.
Paper Structure (34 sections, 1 theorem, 13 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 34 sections, 1 theorem, 13 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Consider a query node $v$. Let $\mathbf{x}_n \sim \mathcal{N}(\mu_{t,\theta},\Sigma_{t,\theta})$ and $\mathbf{x}_p \sim \mathcal{N} (\mu_{0,\theta},\Sigma_{0,\theta})$ represent samples drawn from the negative and positive distributions of node $v$, respectively. Suppose the parameters of the two di as long as $\Psi\ge 0$, which is a random variable given by $\Psi = 2\Delta^\top\sqrt{\bar{\alpha}_

Figures (6)

  • Figure 1: Overall framework of DMNS.
  • Figure 2: Empirical distributions (histograms) of $\Psi$ on (a1--a4) Cora, (b1--b4) Citeseer, (c1--c4) Coauthor-CS, (d1--d4) Actor, across different time steps.
  • Figure 3: Ablation studies.
  • Figure 4: Parameter sensitivity.
  • Figure 5: Histogram of embedding distances from query.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 1: Sub-linear Positivity Diffusion