Table of Contents
Fetching ...

Partially Trained Graph Convolutional Networks Resist Oversmoothing

Dimitrios Kelesis, Dimitris Fotakis, Georgios Paliouras

TL;DR

This work investigates the effect of training only a single layer of a GCN or a GAT (Graph Attention Network), while keeping the rest of the layers frozen and proposes a basis on which the effect of the untrained layers and their contribution to the generation of embeddings can be predicted.

Abstract

In this work we investigate an observation made by Kipf \& Welling, who suggested that untrained GCNs can generate meaningful node embeddings. In particular, we investigate the effect of training only a single layer of a GCN, while keeping the rest of the layers frozen. We propose a basis on which the effect of the untrained layers and their contribution to the generation of embeddings can be predicted. Moreover, we show that network width influences the dissimilarity of node embeddings produced after the initial node features pass through the untrained part of the model. Additionally, we establish a connection between partially trained GCNs and oversmoothing, showing that they are capable of reducing it. We verify our theoretical results experimentally and show the benefits of using deep networks that resist oversmoothing, in a ``cold start'' scenario, where there is a lack of feature information for unlabeled nodes.

Partially Trained Graph Convolutional Networks Resist Oversmoothing

TL;DR

This work investigates the effect of training only a single layer of a GCN or a GAT (Graph Attention Network), while keeping the rest of the layers frozen and proposes a basis on which the effect of the untrained layers and their contribution to the generation of embeddings can be predicted.

Abstract

In this work we investigate an observation made by Kipf \& Welling, who suggested that untrained GCNs can generate meaningful node embeddings. In particular, we investigate the effect of training only a single layer of a GCN, while keeping the rest of the layers frozen. We propose a basis on which the effect of the untrained layers and their contribution to the generation of embeddings can be predicted. Moreover, we show that network width influences the dissimilarity of node embeddings produced after the initial node features pass through the untrained part of the model. Additionally, we establish a connection between partially trained GCNs and oversmoothing, showing that they are capable of reducing it. We verify our theoretical results experimentally and show the benefits of using deep networks that resist oversmoothing, in a ``cold start'' scenario, where there is a lack of feature information for unlabeled nodes.

Paper Structure

This paper contains 15 sections, 7 theorems, 16 equations, 2 figures, 5 tables.

Key Result

Theorem 1

Let the largest singular value of the weight matrix $W_{lh}$ be $s_{lh}$ and $s_l = \prod \limits_{h=1}^{H_l}{s_{lh}}$, where $W_{lh}$ is the weight matrix of layer $\mathit{h}$ and $H_l$ is the network's depth, following the notation of the original paper. Then it holds that $d_M(f_l(X)) \leq s_l \

Figures (2)

  • Figure 1: Architectural diagram of a partially trained GCN.
  • Figure 2: Comparison between a fully trained GCN and 5 different configurations (in terms of width) of partially trained GCNs across 7 datasets for varying depth. The trainable layer is always the second.

Theorems & Definitions (7)

  • Theorem 1: Suzuki
  • Corollary 2: Suzuki
  • Theorem 3: (Bai-Yin’s law Bai
  • Corollary 4
  • Lemma 5
  • Lemma 6
  • Lemma 7