Table of Contents
Fetching ...

SPARC: Spectral Architectures Tackling the Cold-Start Problem in Graph Learning

Yahel Jacobs, Reut Dayan, Uri Shaham

TL;DR

SPARC addresses the practical cold-start problem in graph learning by mapping node features into the Laplacian eigenspace through a generalizable neural map $\mathcal{F}_{\theta}$. This spectral embedding enables predictions for new nodes without adjacency information, allowing existing GNNs, transformers, and Mamba-based models to operate in dynamic, real-world graphs. The framework is instantiated in SPARC-GCN, SPARCphormer, and SAMBA, and extended to clustering, link prediction, and mini-batching, with strong empirical gains on cold-start classification and competitive performance on other tasks. The work highlights the practicality of spectral embeddings for real-world graph dynamics, while noting dependence on feature quality as a potential limitation.

Abstract

Graphs play a central role in modeling complex relationships in data, yet most graph learning methods falter when faced with cold-start nodes--new nodes lacking initial connections--due to their reliance on adjacency information. To tackle this, we propose SPARC, a groundbreaking framework that introduces a novel approach to graph learning by utilizing generalizable spectral embeddings. With a simple yet powerful enhancement, SPARC empowers state-of-the-art methods to make predictions on cold-start nodes effectively. By eliminating the need for adjacency information during inference and effectively capturing the graph's structure, we make these methods suitable for real-world scenarios where new nodes frequently appear. Experimental results demonstrate that our framework outperforms existing models on cold-start nodes across tasks such as node classification, node clustering, and link prediction. SPARC provides a solution to the cold-start problem, advancing the field of graph learning.

SPARC: Spectral Architectures Tackling the Cold-Start Problem in Graph Learning

TL;DR

SPARC addresses the practical cold-start problem in graph learning by mapping node features into the Laplacian eigenspace through a generalizable neural map . This spectral embedding enables predictions for new nodes without adjacency information, allowing existing GNNs, transformers, and Mamba-based models to operate in dynamic, real-world graphs. The framework is instantiated in SPARC-GCN, SPARCphormer, and SAMBA, and extended to clustering, link prediction, and mini-batching, with strong empirical gains on cold-start classification and competitive performance on other tasks. The work highlights the practicality of spectral embeddings for real-world graph dynamics, while noting dependence on feature quality as a potential limitation.

Abstract

Graphs play a central role in modeling complex relationships in data, yet most graph learning methods falter when faced with cold-start nodes--new nodes lacking initial connections--due to their reliance on adjacency information. To tackle this, we propose SPARC, a groundbreaking framework that introduces a novel approach to graph learning by utilizing generalizable spectral embeddings. With a simple yet powerful enhancement, SPARC empowers state-of-the-art methods to make predictions on cold-start nodes effectively. By eliminating the need for adjacency information during inference and effectively capturing the graph's structure, we make these methods suitable for real-world scenarios where new nodes frequently appear. Experimental results demonstrate that our framework outperforms existing models on cold-start nodes across tasks such as node classification, node clustering, and link prediction. SPARC provides a solution to the cold-start problem, advancing the field of graph learning.

Paper Structure

This paper contains 36 sections, 10 equations, 8 figures, 9 tables, 4 algorithms.

Figures (8)

  • Figure 1: Comparative evaluation of neighborhood prediction accuracy across diverse datasets. The bar chart measures the accuracy in terms of overlap between close neighbors in each representation in relation to the actual neighbors within the graph. We assessed both the spectral embedding and a feature-based method in Equation \ref{['eq:featue_based']} across four significant datasets: Cora, Citeseer, PubMed, and Reddit. Spectral embedding consistently shows strong accuracy across all datasets, whereas the feature-based method falls short compared to spectral embeddings, with variability reflecting the extent of dataset-specific information captured in the features.
  • Figure 2: Overview of the SPARC framework. (a) Training and Inference of $\mathcal{F}_\theta$: Left- During training, $\mathcal{F}_\theta$ uses node features and adjacency information to learn the Laplacian eigenfunctions, which use to embed the graph in a Euclidean space. Right- At inference, $\mathcal{F}_\theta$ can process features of cold-start nodes to approximate their spectral embeddings, enabling prediction of their neighborhood solely based on node features. (b) Before SPARC- A GNN model trained on a fixed graph $\mathcal{G}$, utilizes adjacency information during inference, fails to predict for newly introduced cold-start nodes $v_{\text{cold}} \notin \mathcal{V}$, as indicated by the red arrow, showing limitations with cold-start nodes. (c) With SPARC- The enhanced model $\text{GNN}^*$, modified with the SPARC framework, now successfully performs inference on cold-start nodes, shown by the green arrow, utilizing spectral embeddings instead of adjacency information.
  • Figure 3: Cold-start clustering accuracy. Comparison of node clustering accuracies on the Citeseer and Pubmed datasets for both connected and cold-start nodes. The bar charts show accuracies for connected nodes (light blue bars) and cold-start nodes (blue bars). Marked with red crosses, SSGC fails to cluster cold-start nodes, while FB and R-GAE exhibit similar accuracy trends. Accuracy was measured using the Hungarian matching algorithm with node labels as ground truth.
  • Figure 4: Convergence rates on ClusterGCN and the Reddit dataset. Training ClusterGCN on the Reddit dataset with three different mini-batching methods. The spectral clustered mini-batches result in faster convergence during the training process.
  • Figure 5: Cold-start proportion analysis: Accuracy trends of SPARCphormer, GraphSAGE, and the features-based (FB) transformer across increasing cold-start node proportion on the Reddit dataset. The FB transformer, which does not incorporate adjacency information and is oblivious to cold-start ratios, performs consistently at a lower accuracy level.
  • ...and 3 more figures