Table of Contents
Fetching ...

Reducing Oversmoothing through Informed Weight Initialization in Graph Neural Networks

Dimitrios Kelesis, Dimitris Fotakis, Georgios Paliouras

Abstract

In this work, we generalize the ideas of Kaiming initialization to Graph Neural Networks (GNNs) and propose a new scheme (G-Init) that reduces oversmoothing, leading to very good results in node and graph classification tasks. GNNs are commonly initialized using methods designed for other types of Neural Networks, overlooking the underlying graph topology. We analyze theoretically the variance of signals flowing forward and gradients flowing backward in the class of convolutional GNNs. We then simplify our analysis to the case of the GCN and propose a new initialization method. Our results indicate that the new method (G-Init) reduces oversmoothing in deep GNNs, facilitating their effective use. Experimental validation supports our theoretical findings, demonstrating the advantages of deep networks in scenarios with no feature information for unlabeled nodes (i.e., ``cold start'' scenario).

Reducing Oversmoothing through Informed Weight Initialization in Graph Neural Networks

Abstract

In this work, we generalize the ideas of Kaiming initialization to Graph Neural Networks (GNNs) and propose a new scheme (G-Init) that reduces oversmoothing, leading to very good results in node and graph classification tasks. GNNs are commonly initialized using methods designed for other types of Neural Networks, overlooking the underlying graph topology. We analyze theoretically the variance of signals flowing forward and gradients flowing backward in the class of convolutional GNNs. We then simplify our analysis to the case of the GCN and propose a new initialization method. Our results indicate that the new method (G-Init) reduces oversmoothing in deep GNNs, facilitating their effective use. Experimental validation supports our theoretical findings, demonstrating the advantages of deep networks in scenarios with no feature information for unlabeled nodes (i.e., ``cold start'' scenario).

Paper Structure

This paper contains 13 sections, 12 theorems, 49 equations, 9 figures, 3 tables.

Key Result

Theorem 1

Let $s_l=\prod\limits_{h=1}^{H_l}{s_{lh}}$ where $s_{lh}$ is the largest singular value of weight matrix $W_{lh}$ and s = $sup_{l\in \mathbb{N}_+} s_{l}$. Then $d_M(X^{(l)}) = O((s\lambda)^l)$, where l is the layer number and if $s\lambda < 1$ the distance from the oversmoothing subspace exponential

Figures (9)

  • Figure 1: Variance plots on the Cora dataset.
  • Figure 2: Comparison between 6 weight initialization methods across 8 datasets for varying GCN model depth.
  • Figure 3: T-SNE plot of Cora dataset. The upper row presents results for a G-Init initialized 32-layer GCN, while the lower row showcases results for a Kaiming Normal initialized 32-layer GCN.
  • Figure 4: Comparison between 6 weight initialization methods for the needed time to initialize a GCN as its depth increases.
  • Figure 5: T-SNE plot of Cora dataset. The upper row presents results for a G-Init initialized 32-layer GCN, while the lower row showcases results for a Kaiming Normal initialized 32-layer GCN.
  • ...and 4 more figures

Theorems & Definitions (12)

  • Theorem 1: Suzuki
  • Theorem 2: Circular Law Conjecture tao
  • Lemma 3
  • Theorem 4
  • Lemma 5
  • Theorem 6
  • Theorem 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 2 more