Table of Contents
Fetching ...

Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning

Lequan Lin, Dai Shi, Andi Han, Zhiyong Wang, Junbin Gao

TL;DR

This work tackles the costly hyperparameter tuning required to push GNNs to top performance. It proposes GNN-Diff, a graph-conditioned latent diffusion framework that learns to generate high-quality GNN parameters from checkpoints produced by a lightweight coarse search, using a parameter autoencoder, a graph autoencoder, and a graph-conditioned diffusion model. Across 166 experiments on four tasks and 20 datasets, GNN-Diff consistently boosts performance, improves stability on unseen data, and reduces tuning time relative to grid or random search, especially on large and long-range graphs. The approach offers practical impact by enabling near- or better-than-grid performance with substantially lower tuning effort, and it lays groundwork for extending diffusion-based parameter generation to broader graph tasks and architectures.

Abstract

Graph Neural Networks (GNNs) are proficient in graph representation learning and achieve promising performance on versatile tasks such as node classification and link prediction. Usually, a comprehensive hyperparameter tuning is essential for fully unlocking GNN's top performance, especially for complicated tasks such as node classification on large graphs and long-range graphs. This is usually associated with high computational and time costs and careful design of appropriate search spaces. This work introduces a graph-conditioned latent diffusion framework (GNN-Diff) to generate high-performing GNNs based on the model checkpoints of sub-optimal hyperparameters selected by a light-tuning coarse search. We validate our method through 166 experiments across four graph tasks: node classification on small, large, and long-range graphs, as well as link prediction. Our experiments involve 10 classic and state-of-the-art target models and 20 publicly available datasets. The results consistently demonstrate that GNN-Diff: (1) boosts the performance of GNNs with efficient hyperparameter tuning; and (2) presents high stability and generalizability on unseen data across multiple generation runs. The code is available at https://github.com/lequanlin/GNN-Diff.

Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning

TL;DR

This work tackles the costly hyperparameter tuning required to push GNNs to top performance. It proposes GNN-Diff, a graph-conditioned latent diffusion framework that learns to generate high-quality GNN parameters from checkpoints produced by a lightweight coarse search, using a parameter autoencoder, a graph autoencoder, and a graph-conditioned diffusion model. Across 166 experiments on four tasks and 20 datasets, GNN-Diff consistently boosts performance, improves stability on unseen data, and reduces tuning time relative to grid or random search, especially on large and long-range graphs. The approach offers practical impact by enabling near- or better-than-grid performance with substantially lower tuning effort, and it lays groundwork for extending diffusion-based parameter generation to broader graph tasks and architectures.

Abstract

Graph Neural Networks (GNNs) are proficient in graph representation learning and achieve promising performance on versatile tasks such as node classification and link prediction. Usually, a comprehensive hyperparameter tuning is essential for fully unlocking GNN's top performance, especially for complicated tasks such as node classification on large graphs and long-range graphs. This is usually associated with high computational and time costs and careful design of appropriate search spaces. This work introduces a graph-conditioned latent diffusion framework (GNN-Diff) to generate high-performing GNNs based on the model checkpoints of sub-optimal hyperparameters selected by a light-tuning coarse search. We validate our method through 166 experiments across four graph tasks: node classification on small, large, and long-range graphs, as well as link prediction. Our experiments involve 10 classic and state-of-the-art target models and 20 publicly available datasets. The results consistently demonstrate that GNN-Diff: (1) boosts the performance of GNNs with efficient hyperparameter tuning; and (2) presents high stability and generalizability on unseen data across multiple generation runs. The code is available at https://github.com/lequanlin/GNN-Diff.
Paper Structure (46 sections, 9 equations, 13 figures, 18 tables, 5 algorithms)

This paper contains 46 sections, 9 equations, 13 figures, 18 tables, 5 algorithms.

Figures (13)

  • Figure 1: Effect of hyperparameter search space on GCN performance for node classification. The test accuracy is averaged over 10 search runs. We also show the results of our method, GNN-Diff.
  • Figure 2: GNNs with parameters found by (a) grid search with the full search space, (b) random search with the sub-search space, (c) coarse search with the minimal search space, and (d) GNN-Diff generation based on sub-optimal hyperparameters selected by the coarse search. Top performance may be achieved by a large search space at a cost of time, or generated with GNN-Diff efficiently.
  • Figure 3: How GNN-Diff works for node classification. (1) Input graph data: input graph signals, adjacency matrix, and train/validation ground truth labels. (2) Parameter collection: we use a coarse search with a small search space to select an appropriate hyperparameter configuration for the target GNN, and then collect model checkpoints with this configuration. (3) Training: PAE and GAE are firstly trained to produce latent parameters and the graph condition, and then G-LDM learns how to recover latent parameters from white noises with the graph condition as a guidance. (4) Inference: after sampling latent parameters from G-LDM, GNN parameters are reconstructed with the PAE decoder and returned to the target GNN for prediction.
  • Figure 4: (a) Scatter plot between validation and test accuracy for grid, random, and coarse search. The black circle indicates the final models selected by validation. (b) Visualization of generated and input sample parameters after dimension reduction with isomap. Each bubble represents one parameter sample, and the bubble size shows the corresponding test accuracy. (c) Kernel density estimation plot of test accuracy distributions corresponding to input samples, noised sample parameters, and GNN-Diff parameters.
  • Figure 5: GNN-Diff for GCN node classification on Cora with three generation and sampling strategies. Each row contains visualisations of (1) generated and input sample distributions of the first parameter in last layer bias; (2) generated and input sample distributions of the first parameter in last layer weights; (3) test accuracy of generated and input sample parameters.
  • ...and 8 more figures