Diffusion on Graph: Augmentation of Graph Structure for Node Classification
Yancheng Wang, Changyu Liu, Yingzhen Yang
TL;DR
Diffusion on Graph (DoG) addresses the challenge of augmenting node-level learning by generating synthetic nodes and their internal connections within a single graph. It combines a Graph Autoencoder (GAE) with a Latent Diffusion Model (LDM) trained via Classifier-Free Guidance (CFG) in latent space, and uses a Bi-Level Neighborhood Decoder (BLND) to efficiently reconstruct edges, forming an augmented graph. To combat diffusion-induced noise, DoG introduces a low-rank regularization term based on a truncated nuclear norm, with theoretical guarantees on test loss, and demonstrates substantial improvements on node classification and graph contrastive learning across multiple benchmarks, including large-scale graphs. The method is designed to be orthogonal to existing node-level augmentation techniques and is accompanied by an open-source implementation, indicating practical potential for enhancing graph-based learning in diverse domains.
Abstract
Graph diffusion models have recently been proposed to synthesize entire graphs, such as molecule graphs. Although existing methods have shown great performance in generating entire graphs for graph-level learning tasks, no graph diffusion models have been developed to generate synthetic graph structures, that is, synthetic nodes and associated edges within a given graph, for node-level learning tasks. Inspired by the research in the computer vision literature using synthetic data for enhanced performance, we propose Diffusion on Graph (DoG), which generates synthetic graph structures to boost the performance of GNNs. The synthetic graph structures generated by DoG are combined with the original graph to form an augmented graph for the training of node-level learning tasks, such as node classification and graph contrastive learning (GCL). To improve the efficiency of the generation process, a Bi-Level Neighbor Map Decoder (BLND) is introduced in DoG. To mitigate the adverse effect of the noise introduced by the synthetic graph structures, a low-rank regularization method is proposed for the training of graph neural networks (GNNs) on the augmented graphs. Extensive experiments on various graph datasets for semi-supervised node classification and graph contrastive learning have been conducted to demonstrate the effectiveness of DoG with low-rank regularization. The code of DoG is available at https://github.com/Statistical-Deep-Learning/DoG.
