Table of Contents
Fetching ...

LOBSTUR: A Local Bootstrap Framework for Tuning Unsupervised Representations in Graph Neural Networks

So Won Jeong, Claire Donnat

TL;DR

We address hyperparameter sensitivity in unsupervised GNNs by introducing LOBSTUR-GNN, a local bootstrap framework that generates plausible, locally consistent graph replicas via graphon-based edge and feature resampling. Embedding stability across bootstrapped copies is quantified with a Canonical Correlation Analysis (CCA) alignment objective, enabling principled hyperparameter tuning without ground-truth labels. The method provides theoretical consistency results for resampling procedures, validates bootstrap samples by comparing graph statistics to the original, and demonstrates strong downstream performance on standard benchmarks, achieving substantial improvements over uninformed hyperparameter choices. The approach offers a scalable, data-driven path to robust unsupervised GNN representations with practical utility across scientific domains, while noting current scalability challenges and proposing directions like block bootstrap to scale to larger graphs.

Abstract

Graph Neural Networks (GNNs) are increasingly used in conjunction with unsupervised learning techniques to learn powerful node representations, but their deployment is hindered by their high sensitivity to hyperparameter tuning and the absence of established methodologies for selecting the optimal models. To address these challenges, we propose LOBSTUR-GNN ({\bf Lo}cal {\bf B}oot{\bf s}trap for {\bf T}uning {\bf U}nsupervised {\bf R}epresentations in GNNs) i), a novel framework designed to adapt bootstrapping techniques for unsupervised graph representation learning. LOBSTUR-GNN tackles two main challenges: (a) adapting the bootstrap edge and feature resampling process to account for local graph dependencies in creating alternative versions of the same graph, and (b) establishing robust metrics for evaluating learned representations without ground-truth labels. Using locally bootstrapped resampling and leveraging Canonical Correlation Analysis (CCA) to assess embedding consistency, LOBSTUR provides a principled approach for hyperparameter tuning in unsupervised GNNs. We validate the effectiveness and efficiency of our proposed method through extensive experiments on established academic datasets, showing an 65.9\% improvement in the classification accuracy compared to an uninformed selection of hyperparameters. Finally, we deploy our framework on a real-world application, thereby demonstrating its validity and practical utility in various settings. \footnote{The code is available at \href{https://github.com/sowonjeong/lobstur-graph-bootstrap}{github.com/sowonjeong/lobstur-graph-bootstrap}.}

LOBSTUR: A Local Bootstrap Framework for Tuning Unsupervised Representations in Graph Neural Networks

TL;DR

We address hyperparameter sensitivity in unsupervised GNNs by introducing LOBSTUR-GNN, a local bootstrap framework that generates plausible, locally consistent graph replicas via graphon-based edge and feature resampling. Embedding stability across bootstrapped copies is quantified with a Canonical Correlation Analysis (CCA) alignment objective, enabling principled hyperparameter tuning without ground-truth labels. The method provides theoretical consistency results for resampling procedures, validates bootstrap samples by comparing graph statistics to the original, and demonstrates strong downstream performance on standard benchmarks, achieving substantial improvements over uninformed hyperparameter choices. The approach offers a scalable, data-driven path to robust unsupervised GNN representations with practical utility across scientific domains, while noting current scalability challenges and proposing directions like block bootstrap to scale to larger graphs.

Abstract

Graph Neural Networks (GNNs) are increasingly used in conjunction with unsupervised learning techniques to learn powerful node representations, but their deployment is hindered by their high sensitivity to hyperparameter tuning and the absence of established methodologies for selecting the optimal models. To address these challenges, we propose LOBSTUR-GNN ({\bf Lo}cal {\bf B}oot{\bf s}trap for {\bf T}uning {\bf U}nsupervised {\bf R}epresentations in GNNs) i), a novel framework designed to adapt bootstrapping techniques for unsupervised graph representation learning. LOBSTUR-GNN tackles two main challenges: (a) adapting the bootstrap edge and feature resampling process to account for local graph dependencies in creating alternative versions of the same graph, and (b) establishing robust metrics for evaluating learned representations without ground-truth labels. Using locally bootstrapped resampling and leveraging Canonical Correlation Analysis (CCA) to assess embedding consistency, LOBSTUR provides a principled approach for hyperparameter tuning in unsupervised GNNs. We validate the effectiveness and efficiency of our proposed method through extensive experiments on established academic datasets, showing an 65.9\% improvement in the classification accuracy compared to an uninformed selection of hyperparameters. Finally, we deploy our framework on a real-world application, thereby demonstrating its validity and practical utility in various settings. \footnote{The code is available at \href{https://github.com/sowonjeong/lobstur-graph-bootstrap}{github.com/sowonjeong/lobstur-graph-bootstrap}.}

Paper Structure

This paper contains 44 sections, 2 theorems, 26 equations, 9 figures, 16 tables, 6 algorithms.

Key Result

Theorem 3.1

Assume that $\mathcal{G}_{knn}$, the directed $k$- nearest neighbor graph induced by the latent variable $\{U_i\}_{i=1}^n$ is known, with $k$ such that $\lim_{n \to \infty} \frac{k}{n} = 0$. Suppose $g$ is an $\alpha$-Hölder-continuous function on the interval $[0,1]$, so that there exists a constan where $\mathcal{N}_{knn}(i)$ denotes any of the $k$-nearest neighbors of $i.$

Figures (9)

  • Figure 1: Cora. Evaluation of the CCA-SSG embeddings zhang2021canonical, an unsupervised learning method, for each combination of the hyperparameters (loss parameter $\lambda$, edge drop rate (EDR), feature mask rate (FMR)). Each entry denotes the mean and standard deviation of the node classification accuracy of a linear classifier trained on the learned representations (averaged over 20 experiments).
  • Figure 2: Illustration of different techniques for generating new copies of a simple graph (left-most image). The original graph has a distinctive community structure. Note that node sampling or edge sampling randomly removes either nodes or edges, disrupting the original graph structure.
  • Figure 3: Block Bootstrap for Mouse Spleen data. Distribution of graph statistics of bootstrapped graphs. The principle is to see if the graph statistics of the original graph is within the extremity of the distribution of generated samples. The red dotted line indicates the statistics computed on the original graph. Most of the graph statistics do not lie at the extremity of the distribution of graph statistics by bootstrapped samples.
  • Figure 4: Block Bootstrap for Mouse Spleen data. Distribution of node-level statistics of bootstrapped graphs. The orange-colored distribution represents the JS divergence between the bootstrapped samples and the original graph, and the blue-colored distribution represents among bootstrapped samples divergence. The more the two distributions overlap, the bootstrapped samples 'mimic' the original graph well in terms of node-level statistics.
  • Figure 5: Citeseer: Model trained by different hyperparameters. 2D Visualization through PCA. The learned representations vary by the choice of hyperparameters.
  • ...and 4 more figures

Theorems & Definitions (12)

  • Theorem 3.1
  • proof
  • Theorem 3.2
  • proof
  • Remark 3.3
  • Remark 4.1
  • Remark 4.2
  • Definition A.1: Hölder class for Graphon functions (from gao2015rate)
  • Definition A.2: Distance Measures
  • Definition A.3: k-NN Neighborhoods
  • ...and 2 more