Table of Contents
Fetching ...

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets

Tianjin Huang, Tianlong Chen, Meng Fang, Vlado Menkovski, Jiaxu Zhao, Lu Yin, Yulong Pei, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy, Shiwei Liu

TL;DR

The paper addresses whether untrained sparse subnetworks can match fully trained dense GNNs by introducing Untrained GNNs Tickets (UGTs), a global-plus-gradual sparsification pipeline that discovers subnetworks inside randomly initialized GNNs without weight updates. It formalizes masks, scores, and a sparsity schedule, and demonstrates that subnetworks can be found up to sparsities as high as $s_f \approx 0.99$, achieving competitive accuracy across GCN, GIN, and GAT on datasets including eight small graphs and large-scale OGBN-Arxiv. The results show that these untrained subnetworks mitigate over-smoothing in deep GNNs, preserve feature distinctions (as evidenced by MAD and TSNE), and exhibit strong OOD detection and robustness to perturbations, often outperforming Edge-Popup. The findings point to a new direction where performant GNNs can be obtained by identifying suitable untrained subnetworks within randomly weighted architectures, enabling deeper, more scalable models without weight optimization.

Abstract

Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i.e., untrained networks). However, the presence of such untrained subnetworks in graph neural networks (GNNs) still remains mysterious. In this paper we carry out the first-of-its-kind exploration of discovering matching untrained GNNs. With sparsity as the core tool, we can find \textit{untrained sparse subnetworks} at the initialization, that can match the performance of \textit{fully trained dense} GNNs. Besides this already encouraging finding of comparable performance, we show that the found untrained subnetworks can substantially mitigate the GNN over-smoothing problem, hence becoming a powerful tool to enable deeper GNNs without bells and whistles. We also observe that such sparse untrained subnetworks have appealing performance in out-of-distribution detection and robustness of input perturbations. We evaluate our method across widely-used GNN architectures on various popular datasets including the Open Graph Benchmark (OGB).

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets

TL;DR

The paper addresses whether untrained sparse subnetworks can match fully trained dense GNNs by introducing Untrained GNNs Tickets (UGTs), a global-plus-gradual sparsification pipeline that discovers subnetworks inside randomly initialized GNNs without weight updates. It formalizes masks, scores, and a sparsity schedule, and demonstrates that subnetworks can be found up to sparsities as high as , achieving competitive accuracy across GCN, GIN, and GAT on datasets including eight small graphs and large-scale OGBN-Arxiv. The results show that these untrained subnetworks mitigate over-smoothing in deep GNNs, preserve feature distinctions (as evidenced by MAD and TSNE), and exhibit strong OOD detection and robustness to perturbations, often outperforming Edge-Popup. The findings point to a new direction where performant GNNs can be obtained by identifying suitable untrained subnetworks within randomly weighted architectures, enabling deeper, more scalable models without weight optimization.

Abstract

Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i.e., untrained networks). However, the presence of such untrained subnetworks in graph neural networks (GNNs) still remains mysterious. In this paper we carry out the first-of-its-kind exploration of discovering matching untrained GNNs. With sparsity as the core tool, we can find \textit{untrained sparse subnetworks} at the initialization, that can match the performance of \textit{fully trained dense} GNNs. Besides this already encouraging finding of comparable performance, we show that the found untrained subnetworks can substantially mitigate the GNN over-smoothing problem, hence becoming a powerful tool to enable deeper GNNs without bells and whistles. We also observe that such sparse untrained subnetworks have appealing performance in out-of-distribution detection and robustness of input perturbations. We evaluate our method across widely-used GNN architectures on various popular datasets including the Open Graph Benchmark (OGB).
Paper Structure (22 sections, 5 equations, 14 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 5 equations, 14 figures, 6 tables, 1 algorithm.

Figures (14)

  • Figure 1: Performance of untrained graph subnetworks (UGTs (ours) and Edge-Popup ramanujan2020s) and the corresponding trained dense GNNs. We demonstrate that as the model size increases, UGTs is able to find an untrained subnetwork with its random initializations, that can match the performance of the corresponding fully-trained dense GNNs. The x-axis denotes the corresponding model size for each point, e.g. "64-2" represents a model with 2 layers and width 64.
  • Figure 2: The performance of GNNs with increasing model depths. Experiments are conducted on various GNNs with Cora, Citeseer, Pubmed and OGBN-Arxiv. We observe that as the model goes deeper, fully-trained dense GNNs suffer from a sharp accuracy drop, while UGTs preserves the high accuracy. All the results reported are averaged from 5 runs.
  • Figure 3: TSNE visualization of node representations learned by densely trained GCN and UGTs. Ten classes are randomly sampled from OGBN-Arxiv for visualization. Model depth is set as 16 and 32 respectively; width is set as 448. See Appendix \ref{['appendix_tsne']} for GAT architecture.
  • Figure 4: Mean Average Distance among node representations of each GNN layer. Experiments are conducted on Cora with GCN containing 32 layers and width 448.
  • Figure 5: The accuracy of GNNs w.r.t varying sparsities. Experiments are conducted on various GNNs with 2 layers and width 256 for Cora, Citeseer and Pubmed, 4 layers and width 386 for OGBN-Arxiv.
  • ...and 9 more figures