Table of Contents
Fetching ...

Early-Bird GCNs: Graph-Network Co-Optimization Towards More Efficient GCN Training and Inference via Drawing Early-Bird Lottery Tickets

Haoran You, Zhihan Lu, Zijian Zhou, Yonggan Fu, Yingyan Celine Lin

TL;DR

This work identifies graph early-bird (GEB) tickets in sparsified GCN graphs and demonstrates that such tickets emerge at very early training stages, enabling automatic detection. It introduces GEBT, a framework for graph-model co-sparsification that jointly draws EB tickets for both graph structures and GCN weights, and extends the concept to joint-EB tickets across graphs and networks. Through extensive experiments across multiple GCN architectures and datasets, the approach achieves substantial reductions in training and inference costs (up to tens of percent to over 80% in FLOPs) with accuracy that is comparable to or better than strong baselines. The results offer a practical pathway to scalable, efficient GCN training and deployment on large real-world graphs, highlighted by the public code release for reproducibility.

Abstract

Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. However, it remains notoriously challenging to train and inference GCNs over large graph datasets, limiting their application to large real-world graphs and hindering the exploration of deeper and more sophisticated GCN graphs. This is because as the graph size grows, the sheer number of node features and the large adjacency matrix can easily explode the required memory and data movements. To tackle the aforementioned challenges, we explore the possibility of drawing lottery tickets when sparsifying GCN graphs, i.e., subgraphs that largely shrink the adjacency matrix yet are capable of achieving accuracy comparable to or even better than their full graphs. Specifically, we for the first time discover the existence of graph early-bird (GEB) tickets that emerge at the very early stage when sparsifying GCN graphs, and propose a simple yet effective detector to automatically identify the emergence of such GEB tickets. Furthermore, we advocate graph-model co-optimization and develop a generic efficient GCN early-bird training framework dubbed GEBT that can significantly boost the efficiency of GCN training by (1) drawing joint early-bird tickets between the GCN graphs and models and (2) enabling simultaneously sparsification of both the GCN graphs and models. Experiments on various GCN models and datasets consistently validate our GEB finding and the effectiveness of our GEBT, e.g., our GEBT achieves up to 80.2% ~ 85.6% and 84.6% ~ 87.5% savings of GCN training and inference costs while offering a comparable or even better accuracy as compared to state-of-the-art methods. Our source code and supplementary appendix are available at https://github.com/RICE-EIC/Early-Bird-GCN.

Early-Bird GCNs: Graph-Network Co-Optimization Towards More Efficient GCN Training and Inference via Drawing Early-Bird Lottery Tickets

TL;DR

This work identifies graph early-bird (GEB) tickets in sparsified GCN graphs and demonstrates that such tickets emerge at very early training stages, enabling automatic detection. It introduces GEBT, a framework for graph-model co-sparsification that jointly draws EB tickets for both graph structures and GCN weights, and extends the concept to joint-EB tickets across graphs and networks. Through extensive experiments across multiple GCN architectures and datasets, the approach achieves substantial reductions in training and inference costs (up to tens of percent to over 80% in FLOPs) with accuracy that is comparable to or better than strong baselines. The results offer a practical pathway to scalable, efficient GCN training and deployment on large real-world graphs, highlighted by the public code release for reproducibility.

Abstract

Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs. However, it remains notoriously challenging to train and inference GCNs over large graph datasets, limiting their application to large real-world graphs and hindering the exploration of deeper and more sophisticated GCN graphs. This is because as the graph size grows, the sheer number of node features and the large adjacency matrix can easily explode the required memory and data movements. To tackle the aforementioned challenges, we explore the possibility of drawing lottery tickets when sparsifying GCN graphs, i.e., subgraphs that largely shrink the adjacency matrix yet are capable of achieving accuracy comparable to or even better than their full graphs. Specifically, we for the first time discover the existence of graph early-bird (GEB) tickets that emerge at the very early stage when sparsifying GCN graphs, and propose a simple yet effective detector to automatically identify the emergence of such GEB tickets. Furthermore, we advocate graph-model co-optimization and develop a generic efficient GCN early-bird training framework dubbed GEBT that can significantly boost the efficiency of GCN training by (1) drawing joint early-bird tickets between the GCN graphs and models and (2) enabling simultaneously sparsification of both the GCN graphs and models. Experiments on various GCN models and datasets consistently validate our GEB finding and the effectiveness of our GEBT, e.g., our GEBT achieves up to 80.2% ~ 85.6% and 84.6% ~ 87.5% savings of GCN training and inference costs while offering a comparable or even better accuracy as compared to state-of-the-art methods. Our source code and supplementary appendix are available at https://github.com/RICE-EIC/Early-Bird-GCN.

Paper Structure

This paper contains 12 sections, 3 equations, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: Retraining accuracy vs. epoch numbers at which subgraphs are drawn, when evaluating the GCNs kipf2017semi on three graph datasets: Cora, Citeseer, and Pumbed, where dashed lines show the accuracy of GCNs on corresponding unpruned full graphs, $p_g$ denotes the graph pruning ratios, and error bars show the minimum and maximum of ten runs.
  • Figure 2: The visualization of (a) pairwise graph distance matrices, and (b) recorded graph distance's evolution along the training trajectories under different graph pruning ratios.
  • Figure 3: Retraining accuracy vs. inference FLOPs of our co-sparsification framework and a SOTA graph sparsification framework, SGCN li2020sgcn.
  • Figure 4: (a) Retraining accuracy vs. epoch numbers at which both the subgraphs and subnetworks (i.e., joint-EB tickets) are drawn, for GCN networks kipf2017semi on Cora and CiteSeer datasets, where $p_g$ indicates the graph pruning ratio and $p_w$ denotes the network pruning ratio, and (b) the distance's evolution along the training trajectories under different graph and network pruning ratio pairs.
  • Figure 5: An overview of the existing efficient GCN training pipeline and our GEBT training schemes via drawing GEB tickets and joint-EB tickets (red circle denotes the training process).
  • ...and 3 more figures