Table of Contents
Fetching ...

Multi-view Graph Condensation via Tensor Decomposition

Nícolas Roque dos Santos, Dawon Ahn, Diego Minatel, Alneu de Andrade Lopes, Evangelos E. Papalexakis

TL;DR

The paper tackles the scalability of graph neural networks on large graphs by formulating graph condensation as a tensor-decomposition problem. It introduces GCTD, which builds a multi-view augmented adjacency tensor and applies a nonnegative RESCAL decomposition to obtain a latent factor matrix $\mathbf U$ and a core tensor $\boldsymbol{\mathscr R}$, from which a condensed graph is derived by averaging across views and clustering of $\mathbf U$. Synthetic nodes, features, and labels are computed with a focus on preserving underrepresented splits/classes, enabled by hard nonnegativity constraints and ReLU activations. Across six real-world datasets, GCTD achieves up to 4.0% accuracy gains on several datasets and delivers competitive results on large graphs, demonstrating scalability, interpretability, and robustness of tensor-based graph condensation.

Abstract

Graph Neural Networks (GNNs) have demonstrated remarkable results in various real-world applications, including drug discovery, object detection, social media analysis, recommender systems, and text classification. In contrast to their vast potential, training them on large-scale graphs presents significant computational challenges due to the resources required for their storage and processing. Graph Condensation has emerged as a promising solution to reduce these demands by learning a synthetic compact graph that preserves the essential information of the original one while maintaining the GNN's predictive performance. Despite their efficacy, current graph condensation approaches frequently rely on a computationally intensive bi-level optimization. Moreover, they fail to maintain a mapping between synthetic and original nodes, limiting the interpretability of the model's decisions. In this sense, a wide range of decomposition techniques have been applied to learn linear or multi-linear functions from graph data, offering a more transparent and less resource-intensive alternative. However, their applicability to graph condensation remains unexplored. This paper addresses this gap and proposes a novel method called Multi-view Graph Condensation via Tensor Decomposition (GCTD) to investigate to what extent such techniques can synthesize an informative smaller graph and achieve comparable downstream task performance. Extensive experiments on six real-world datasets demonstrate that GCTD effectively reduces graph size while preserving GNN performance, achieving up to a 4.0\ improvement in accuracy on three out of six datasets and competitive performance on large graphs compared to existing approaches. Our code is available at https://anonymous.4open.science/r/gctd-345A.

Multi-view Graph Condensation via Tensor Decomposition

TL;DR

The paper tackles the scalability of graph neural networks on large graphs by formulating graph condensation as a tensor-decomposition problem. It introduces GCTD, which builds a multi-view augmented adjacency tensor and applies a nonnegative RESCAL decomposition to obtain a latent factor matrix and a core tensor , from which a condensed graph is derived by averaging across views and clustering of . Synthetic nodes, features, and labels are computed with a focus on preserving underrepresented splits/classes, enabled by hard nonnegativity constraints and ReLU activations. Across six real-world datasets, GCTD achieves up to 4.0% accuracy gains on several datasets and delivers competitive results on large graphs, demonstrating scalability, interpretability, and robustness of tensor-based graph condensation.

Abstract

Graph Neural Networks (GNNs) have demonstrated remarkable results in various real-world applications, including drug discovery, object detection, social media analysis, recommender systems, and text classification. In contrast to their vast potential, training them on large-scale graphs presents significant computational challenges due to the resources required for their storage and processing. Graph Condensation has emerged as a promising solution to reduce these demands by learning a synthetic compact graph that preserves the essential information of the original one while maintaining the GNN's predictive performance. Despite their efficacy, current graph condensation approaches frequently rely on a computationally intensive bi-level optimization. Moreover, they fail to maintain a mapping between synthetic and original nodes, limiting the interpretability of the model's decisions. In this sense, a wide range of decomposition techniques have been applied to learn linear or multi-linear functions from graph data, offering a more transparent and less resource-intensive alternative. However, their applicability to graph condensation remains unexplored. This paper addresses this gap and proposes a novel method called Multi-view Graph Condensation via Tensor Decomposition (GCTD) to investigate to what extent such techniques can synthesize an informative smaller graph and achieve comparable downstream task performance. Extensive experiments on six real-world datasets demonstrate that GCTD effectively reduces graph size while preserving GNN performance, achieving up to a 4.0\ improvement in accuracy on three out of six datasets and competitive performance on large graphs compared to existing approaches. Our code is available at https://anonymous.4open.science/r/gctd-345A.

Paper Structure

This paper contains 11 sections, 9 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Pipeline of GCTD. We construct a tensor by augmenting the graph's adjacency matrix $\mathbf{A}^{\mathcal{T}}$ and stacking them together in the third dimension with $\mathbf{A}^{\mathcal{T}}$. Then, we apply non-negative RESCAL to the given tensor to extract low-rank structures $\mathbf{U}$ and a multi-view condensed graph $\boldsymbol{\mathscr{R}}$. Lastly, we obtain a condensed graph by aggregating $\boldsymbol{\mathscr{R}}$ along the third mode, and we compute the feature and label for each synthetic node by applying K-Means to $\mathbf{U}$.
  • Figure 2: Accuracy scores achieved by our proposed method on graphs with varying numbers of views. The values following each dataset name represent the condensation ratio applied in this ablation study. The experiments were run ten times and we report the average accuracy and the respective error bars.
  • Figure 3: Accuracy of GCTD with K-Means and Argmax as the method employed to compute the synthetic node assignments from factor matrix $\mathbf{U}$. In this ablation study, we used Citeseer, Cora, Pubmed, and Flickr with the condensation ratio set to 1.8%, 2.6%, 0.15%, and 0.5%, respectively.
  • Figure 4: Visualization of condensed graphs generated by GCTD. Each node represents a synthetic node, with its color indicating the corresponding class.