Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks

Wenying Duan; Tianxiang Fang; Hong Rao; Xiaoxi He

Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks

Wenying Duan, Tianxiang Fang, Hong Rao, Xiaoxi He

TL;DR

This work addresses the high computational burden of Adaptive Spatial-Temporal Graph Neural Networks (ASTGNNs) by introducing Graph Winning Tickets (GWT) inspired by the Lottery Ticket Hypothesis. It pre-identifies a star topology as a GWT, enabling efficient two-hop message passing that reduces graph-convolution complexity from $\mathcal{O}(N^2)$ toward $\mathcal{O}(N)$ while preserving global spatial-temporal propagation. Empirical results across multiple large-scale datasets show that ASTGNNs trained on the star GWT achieve comparable or superior performance to full-graph baselines, with substantial reductions in training and inference time, and even enable training on datasets where full graphs fail due to memory constraints. The approach is strengthened by theoretical backing from spectrally-based graph approximations and by practical enhancements (GWT-AGCN, averaged central-node initialization, and efficiency-oriented message-passing design). This work broadens the applicability of the Lottery Ticket Hypothesis to resource-constrained graph learning, offering a scalable path for deploying ASTGNNs in large-scale spatial-temporal forecasting tasks.

Abstract

In this paper, we present a novel method to significantly enhance the computational efficiency of Adaptive Spatial-Temporal Graph Neural Networks (ASTGNNs) by introducing the concept of the Graph Winning Ticket (GWT), derived from the Lottery Ticket Hypothesis (LTH). By adopting a pre-determined star topology as a GWT prior to training, we balance edge reduction with efficient information propagation, reducing computational demands while maintaining high model performance. Both the time and memory computational complexity of generating adaptive spatial-temporal graphs is significantly reduced from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$. Our approach streamlines the ASTGNN deployment by eliminating the need for exhaustive training, pruning, and retraining cycles, and demonstrates empirically across various datasets that it is possible to achieve comparable performance to full models with substantially lower computational costs. Specifically, our approach enables training ASTGNNs on the largest scale spatial-temporal dataset using a single A6000 equipped with 48 GB of memory, overcoming the out-of-memory issue encountered during original training and even achieving state-of-the-art performance. Furthermore, we delve into the effectiveness of the GWT from the perspective of spectral graph theory, providing substantial theoretical support. This advancement not only proves the existence of efficient sub-networks within ASTGNNs but also broadens the applicability of the LTH in resource-constrained settings, marking a significant step forward in the field of graph neural networks. Code is available at https://anonymous.4open.science/r/paper-1430.

Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks

TL;DR

toward

while preserving global spatial-temporal propagation. Empirical results across multiple large-scale datasets show that ASTGNNs trained on the star GWT achieve comparable or superior performance to full-graph baselines, with substantial reductions in training and inference time, and even enable training on datasets where full graphs fail due to memory constraints. The approach is strengthened by theoretical backing from spectrally-based graph approximations and by practical enhancements (GWT-AGCN, averaged central-node initialization, and efficiency-oriented message-passing design). This work broadens the applicability of the Lottery Ticket Hypothesis to resource-constrained graph learning, offering a scalable path for deploying ASTGNNs in large-scale spatial-temporal forecasting tasks.

Abstract

. Our approach streamlines the ASTGNN deployment by eliminating the need for exhaustive training, pruning, and retraining cycles, and demonstrates empirically across various datasets that it is possible to achieve comparable performance to full models with substantially lower computational costs. Specifically, our approach enables training ASTGNNs on the largest scale spatial-temporal dataset using a single A6000 equipped with 48 GB of memory, overcoming the out-of-memory issue encountered during original training and even achieving state-of-the-art performance. Furthermore, we delve into the effectiveness of the GWT from the perspective of spectral graph theory, providing substantial theoretical support. This advancement not only proves the existence of efficient sub-networks within ASTGNNs but also broadens the applicability of the LTH in resource-constrained settings, marking a significant step forward in the field of graph neural networks. Code is available at https://anonymous.4open.science/r/paper-1430.

Paper Structure (19 sections, 5 theorems, 15 equations, 8 figures, 7 tables)

This paper contains 19 sections, 5 theorems, 15 equations, 8 figures, 7 tables.

introduction
Related Work
Spatial-Temporal Graph Neural Networks
Lottery Ticket Hypothesis.
Preliminaries
Notations and Problem Definition
GAT vs. AGCN
Graph Tickets Hypothesis
Method
Pre-Identifying the Graph Winning Ticket
Further Enhancements
EVALUATION
Experimental Settings
Main Results
Analysis
...and 4 more sections

Key Result

proposition 1

In an N-order complete graph $\mathcal{{K}}_{N}$, there exists a graph $\mathcal{T}$ such that $\mathcal{T}$ is a spanning tree of $\mathcal{{K}}_{N}$ and the diameter of $\mathcal{T}$ is 2, and the topology of $\mathcal{T}$ unequivocally satisfies definition of star spanning tree in Hypothesis 1.

Figures (8)

Figure 1: A complete graph and a star spanning tree with a pre-specified node number.
Figure 2: 2-hop message passing path of $\mathcal{T}^\star$ with pre-specified node numbers. The red node is the central node $u_v$ and the gray nodes are leaf nodes $v\in \{ \mathcal{V} \setminus \left \{ u_{c} \right \} \}$.
Figure 3: Training loss (a) and testing MAE (b) curve of original AGCRN and AGCRN$^\ast$ trained on PEMS07, respectively.
Figure 4: Training loss (a) and testing MAE (b) curve of original AGCRN and AGCRN$^\ast$ trained on SD, respectively.
Figure 5: Testing accuracies, measured in MAE, for AGCRN on the PEMS07, SD, GBA, GLA, and CA datasets, with a perturbation ratio of $p$ ranging from 0% to 50%.
...and 3 more figures

Theorems & Definitions (5)

proposition 1
lemma 1
lemma 2
lemma 3
lemma 4

Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks

TL;DR

Abstract

Pre-Training Identification of Graph Winning Tickets in Adaptive Spatial-Temporal Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)