Table of Contents
Fetching ...

Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification

Beini Xie, Heng Chang, Ziwei Zhang, Zeyang Zhang, Simin Wu, Xin Wang, Yuan Meng, Wenwu Zhu

TL;DR

The paper proposes to design a joint graph data and architecture mechanism, which identifies important sub-architectures via the valuable graph data, which achieves on-par or even higher node classification performance with half or fewer model parameters of searched GNNs and a sparser graph.

Abstract

Graph Neural Architecture Search (GNAS) has achieved superior performance on various graph-structured tasks. However, existing GNAS studies overlook the applications of GNAS in resource-constraint scenarios. This paper proposes to design a joint graph data and architecture mechanism, which identifies important sub-architectures via the valuable graph data. To search for optimal lightweight Graph Neural Networks (GNNs), we propose a Lightweight Graph Neural Architecture Search with Graph SparsIfication and Network Pruning (GASSIP) method. In particular, GASSIP comprises an operation-pruned architecture search module to enable efficient lightweight GNN search. Meanwhile, we design a novel curriculum graph data sparsification module with an architecture-aware edge-removing difficulty measurement to help select optimal sub-architectures. With the aid of two differentiable masks, we iteratively optimize these two modules to efficiently search for the optimal lightweight architecture. Extensive experiments on five benchmarks demonstrate the effectiveness of GASSIP. Particularly, our method achieves on-par or even higher node classification performance with half or fewer model parameters of searched GNNs and a sparser graph.

Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification

TL;DR

The paper proposes to design a joint graph data and architecture mechanism, which identifies important sub-architectures via the valuable graph data, which achieves on-par or even higher node classification performance with half or fewer model parameters of searched GNNs and a sparser graph.

Abstract

Graph Neural Architecture Search (GNAS) has achieved superior performance on various graph-structured tasks. However, existing GNAS studies overlook the applications of GNAS in resource-constraint scenarios. This paper proposes to design a joint graph data and architecture mechanism, which identifies important sub-architectures via the valuable graph data. To search for optimal lightweight Graph Neural Networks (GNNs), we propose a Lightweight Graph Neural Architecture Search with Graph SparsIfication and Network Pruning (GASSIP) method. In particular, GASSIP comprises an operation-pruned architecture search module to enable efficient lightweight GNN search. Meanwhile, we design a novel curriculum graph data sparsification module with an architecture-aware edge-removing difficulty measurement to help select optimal sub-architectures. With the aid of two differentiable masks, we iteratively optimize these two modules to efficiently search for the optimal lightweight architecture. Extensive experiments on five benchmarks demonstrate the effectiveness of GASSIP. Particularly, our method achieves on-par or even higher node classification performance with half or fewer model parameters of searched GNNs and a sparser graph.

Paper Structure

This paper contains 23 sections, 12 equations, 5 figures, 9 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overlaps of removed edges for GCN, GAT, GIN, and Random under diffident graph data sparsification.
  • Figure 2: The iterative training framework of GASSIP. The graph data and architecture parameters are iteratively optimized. The operation-pruned architecture search first receives the current learned graph structure and then interactively performs supernet training and operation pruning. For the curriculum graph data sparsification, it estimates edge-removing difficulty from node- and architecture-view and updates the graph structure via architecture sampling and sample reweighting.
  • Figure 3: Scatter plots showing the relationship between the total number of model parameters and node classification performance on (a) Cora, (b) CiteSeer, and (c) Physics. Methods with $*$ are able to perform graph sparsification. Scatters in the upper left show higher classification performance with lower parameter counts.
  • Figure 4: Ablation study for GASSIP under scenarios of without operation pruning (w/o op prn), without graph data sparsification (w/o sp), without curriculum scheduler (w/o cur).
  • Figure 5: Lineplots for hyper-parameters (a) $\lambda_1$ (fix $\lambda_2=1$) and (b) $\lambda_2$ (fix $\lambda_1=1$).