TinyGraph: Joint Feature and Node Condensation for Graph Neural Networks

Yezi Liu; Yanning Shen

TinyGraph: Joint Feature and Node Condensation for Graph Neural Networks

Yezi Liu, Yanning Shen

TL;DR

TinyGraph tackles the challenge of training GNNs on large graphs with high-dimensional features by jointly condensing nodes and features. It leverages a structure-aware feature condenser based on Graph Attention Networks and a gradient-matching objective to align training trajectories between the original and condensed graphs, avoiding costly nested optimization via curriculum-based updates. The approach demonstrates strong empirical performance, retaining near full-feature accuracy with substantial reductions in node count and feature dimensionality across multiple datasets, while offering scalable and architecture-agnostic transferability. Overall, TinyGraph provides a practical framework for data-efficient GNN training on large-scale graphs with high-dimensional nodal features, delivering major storage and computation savings with competitive accuracy.

Abstract

Training graph neural networks (GNNs) on large-scale graphs can be challenging due to the high computational expense caused by the massive number of nodes and high-dimensional nodal features. Existing graph condensation studies tackle this problem only by reducing the number of nodes in the graph. However, the resulting condensed graph data can still be cumbersome. Specifically, although the nodes of the Citeseer dataset are reduced to 0.9% (30 nodes) in training, the number of features is 3,703, severely exceeding the training sample magnitude. Faced with this challenge, we study the problem of joint condensation for both features and nodes in large-scale graphs. This task is challenging mainly due to 1) the intertwined nature of the node features and the graph structure calls for the feature condensation solver to be structure-aware; and 2) the difficulty of keeping useful information in the condensed graph. To address these challenges, we propose a novel framework TinyGraph, to condense features and nodes simultaneously in graphs. Specifically, we cast the problem as matching the gradients of GNN weights trained on the condensed graph and the gradients obtained from training over the original graph, where the feature condensation is achieved by a trainable function. The condensed graph obtained by minimizing the matching loss along the training trajectory can henceforth retain critical information in the original graph. Extensive experiments were carried out to demonstrate the effectiveness of the proposed TinyGraph. For example, a GNN trained with TinyGraph retains 98.5% and 97.5% of the original test accuracy on the Cora and Citeseer datasets, respectively, while significantly reducing the number of nodes by 97.4% and 98.2%, and the number of features by 90.0% on both datasets.

TinyGraph: Joint Feature and Node Condensation for Graph Neural Networks

TL;DR

Abstract

Paper Structure (23 sections, 10 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 10 equations, 5 figures, 8 tables, 1 algorithm.

Introduction
Problem Formulation
Proposed Algorithm
Structure-aware feature condensation
Gradient Matching
Model Optimization
Discussion on the Differences from Related Studies
Experiments
Experimental setup.
Can TinyGraph achieve comparable performance with baselines using the original features?
Can TinyGraph archive better performance compared to structure-agnostic feature condensation baselines?
How many features are needed for TinyGraph to achieve equal performance with full-feature baselines?
What is the effect of GAT compared to other feature condensation functions? --- An Ablation study.
How does TinyGraph perform with various GNN models? --- A Generalizability Analysis.
Will different feature condensation functions work with different GNN architectures for gradient matching?
...and 8 more sections

Figures (5)

Figure 1: The goal of our proposed TinyGraph, which learn a condensed graph with much smaller node and feature sizes from a large graph. For example, TinyGraph is able to condense Citeseer with a data reduction of $90.0\%$ on features, $98.2\%$ on node size, and $99.7\%$ in storage.
Figure 2: An overview of the TinyGraph framework for addressing the joint condensation problem via gradient matching. The objective is to learn a condensed 'tiny' graph that can be used to train a GNN, achieving performance comparable to training on the original graph. Both the feature condensation function and the gradient matching GNN are trainable.
Figure 3: Performance comparison of our proposed method and baseline methods on various condensation ratios $r_d$, when $r_d$ is fixed to $2.6\%$, $1.8\%$, $0.5\%$, $0.1\%$, and $0.25\%$ on Cora, Citeseer, Flickr, Reddit, and Arxiv, respectively.
Figure 4: Cross-architecture performance is shown in test accuracy (%). SAGE: GraphSAGE. Graphs condensed by different feature condensation functions all show strong transfer performance on other gradient-matching GNNs.
Figure 5: t-SNE visualization of original node features and condensed node features learned by TinyGraph, on Cora and Citeseer, with $r_d=10\%$ and $r_d=20\%$ for both datasets. Different colors represent different classes. The distinctly separable clusters in the t-SNE of condensed features demonstrate the discriminative capability of TinyGraph.

TinyGraph: Joint Feature and Node Condensation for Graph Neural Networks

TL;DR

Abstract

TinyGraph: Joint Feature and Node Condensation for Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)