Table of Contents
Fetching ...

Deep Graph Attention Networks

Jun Kato, Airi Mita, Keita Gobara, Akihiro Inokuchi

TL;DR

DeepGAT enables the training of a large network to acquire similar attention coefficients to a network with few layers and obviates the need to tune the number of layers, thus saving time and enhancing GNN performance.

Abstract

Graphs are useful for representing various realworld objects. However, graph neural networks (GNNs) tend to suffer from over-smoothing, where the representations of nodes of different classes become similar as the number of layers increases, leading to performance degradation. A method that does not require protracted tuning of the number of layers is needed to effectively construct a graph attention network (GAT), a type of GNN. Therefore, we introduce a method called "DeepGAT" for predicting the class to which nodes belong in a deep GAT. It avoids over-smoothing in a GAT by ensuring that nodes in different classes are not similar at each layer. Using DeepGAT to predict class labels, a 15-layer network is constructed without the need to tune the number of layers. DeepGAT prevented over-smoothing and achieved a 15-layer GAT with similar performance to a 2-layer GAT, as indicated by the similar attention coefficients. DeepGAT enables the training of a large network to acquire similar attention coefficients to a network with few layers. It avoids the over-smoothing problem and obviates the need to tune the number of layers, thus saving time and enhancing GNN performance.

Deep Graph Attention Networks

TL;DR

DeepGAT enables the training of a large network to acquire similar attention coefficients to a network with few layers and obviates the need to tune the number of layers, thus saving time and enhancing GNN performance.

Abstract

Graphs are useful for representing various realworld objects. However, graph neural networks (GNNs) tend to suffer from over-smoothing, where the representations of nodes of different classes become similar as the number of layers increases, leading to performance degradation. A method that does not require protracted tuning of the number of layers is needed to effectively construct a graph attention network (GAT), a type of GNN. Therefore, we introduce a method called "DeepGAT" for predicting the class to which nodes belong in a deep GAT. It avoids over-smoothing in a GAT by ensuring that nodes in different classes are not similar at each layer. Using DeepGAT to predict class labels, a 15-layer network is constructed without the need to tune the number of layers. DeepGAT prevented over-smoothing and achieved a 15-layer GAT with similar performance to a 2-layer GAT, as indicated by the similar attention coefficients. DeepGAT enables the training of a large network to acquire similar attention coefficients to a network with few layers. It avoids the over-smoothing problem and obviates the need to tune the number of layers, thus saving time and enhancing GNN performance.

Paper Structure

This paper contains 13 sections, 2 theorems, 12 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

We assume that $\sigma$ in a GCN is the identity function. The representations $\bm{h}^l_v$ in the GCN follow the normal distribution defined below. where $W_{1 \sim l}=W^lW^{l-1}\cdots W^1$. $\blacksquare$

Figures (6)

  • Figure 1: Convolutions in graph neural networks.
  • Figure 2: The overlaps of two distributions.
  • Figure 3: Performance degradation of GAT.
  • Figure 4: Micro-F1 scores with various numbers of layers $L$ for the CS, Physics, Flickr, and PPI datasets.
  • Figure 5: Comparison between $D_{KL}(\bm{\alpha}_{v}^{2} || \bm{\alpha}_{v}^{L_{max}})$.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Lemma 1: yajima
  • Lemma 2
  • Proof 1