Attention is all you need for boosting graph convolutional neural network

Yinwei Wu

Attention is all you need for boosting graph convolutional neural network

Yinwei Wu

TL;DR

A plug-in module, Graph Knowledge Enhancement and Distillation Module (GKEDM), is proposed, which improves node representations through multi-head attention aggregation and enables efficient knowledge transfer via attention-based distillation.

Abstract

Graph Convolutional Neural Networks (GCNs) possess strong capabilities for processing graph data in non-grid domains. They can capture the topological logical structure and node features in graphs and integrate them into nodes' final representations. GCNs have been extensively studied in various fields, such as recommendation systems, social networks, and protein molecular structures. With the increasing application of graph neural networks, research has focused on improving their performance while compressing their size. In this work, a plug-in module named Graph Knowledge Enhancement and Distillation Module (GKEDM) is proposed. GKEDM can enhance node representations and improve the performance of GCNs by extracting and aggregating graph information via multi-head attention mechanism. Furthermore, GKEDM can serve as an auxiliary transferor for knowledge distillation. With a specially designed attention distillation method, GKEDM can distill the knowledge of large teacher models into high-performance and compact student models. Experiments on multiple datasets demonstrate that GKEDM can significantly improve the performance of various GCNs with minimal overhead. Furthermore, it can efficiently transfer distilled knowledge from large teacher networks to small student networks via attention distillation.

Attention is all you need for boosting graph convolutional neural network

TL;DR

Abstract

Paper Structure (28 sections, 13 equations, 4 figures, 9 tables)

This paper contains 28 sections, 13 equations, 4 figures, 9 tables.

Introduction
Related work
Graph convolutional neural network
Graph Neural network knowledge distillation
Attention mechanism
Background
Notations
Graph nerual network and Graph representation learning
Multi-headed self-Attention mechanism
Knowledge distillation
Methods
Graph knowledge enhancement module
Reason
Method
Graph knowledge distillation module
...and 13 more sections

Figures (4)

Figure 1: The existence of over-smoothing in graph neural networks
Figure 2: GKEDM knowledge enhancement module:GKEDM knowledge enhancement module consists of two phases. In the first phase, GKEDM trains a GCN. In the second stage, GKEDM extracts the trained GCN's backbone and concats it with the GKEDM knowledge enhancement module for fine-tuning.
Figure 3: Knowledge distillation module of GKEDM:The GKEDM knowledge distillation module will drive the student network to mimic the topology of the teacher network.
Figure 4: Relationship between $\alpha$ setting and distillation effect

Attention is all you need for boosting graph convolutional neural network

TL;DR

Abstract

Attention is all you need for boosting graph convolutional neural network

Authors

TL;DR

Abstract

Table of Contents

Figures (4)