GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks

Lisa Schneckenreiter; Richard Freinschlag; Florian Sestak; Johannes Brandstetter; Günter Klambauer; Andreas Mayr

GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks

Lisa Schneckenreiter, Richard Freinschlag, Florian Sestak, Johannes Brandstetter, Günter Klambauer, Andreas Mayr

TL;DR

By applying signal propagation theory, a variance-preserving aggregation function (VPA) is proposed that maintains expressivity, but yields improved forward and backward dynamics and could pave the way towards normalizer-free or self-normalizing GNNs.

Abstract

Graph neural networks (GNNs), and especially message-passing neural networks, excel in various domains such as physics, drug discovery, and molecular modeling. The expressivity of GNNs with respect to their ability to discriminate non-isomorphic graphs critically depends on the functions employed for message aggregation and graph-level readout. By applying signal propagation theory, we propose a variance-preserving aggregation function (VPA) that maintains expressivity, but yields improved forward and backward dynamics. Experiments demonstrate that VPA leads to increased predictive performance for popular GNN architectures as well as improved learning dynamics. Our results could pave the way towards normalizer-free or self-normalizing GNNs.

GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks

TL;DR

Abstract

Paper Structure (14 sections, 3 theorems, 8 equations, 2 figures, 2 tables)

This paper contains 14 sections, 3 theorems, 8 equations, 2 figures, 2 tables.

Introduction and related work
GNNs with Variance Preservation
Experiments
Discussion
Theoretical Details
Aggregation vs. Pooling
MLP Signal Propagation
Proof \ref{['lemma:vpp']}
Proof \ref{['lemma:expressivity']}
Proof \ref{['lemma:vpp_att']}
Experimental Details & Further Results
Implementation Details
Extended results
Learning Dynamics

Key Result

Lemma 1

Let $z_1,\ldots, z_N$ be independent copies of a centered random variable $z$ with finite variance. Then the random variable $y=\frac{1}{\sqrt{N}}\sum_{n=1}^N z_n$ has the same mean and variance as $z$.

Figures (2)

Figure 1: Overview of main message aggregation functions and their properties.
Figure B1: Learning Curves of the GIN architecture with different aggregation functions on the TUDataset benchmarks used by Xu2019 and which were retrieved in the version as provided by Morris2020. Note that the default hyperparameters are adjusted to the sum aggregation function. Nevertheless, the network training converges faster with variance-preserving aggregation (vpa) compared to sum aggregation. At the same time, vpa also maintains expressivity, whereas mean and max aggregation decrease the expressivity of GNNs.

Theorems & Definitions (6)

Lemma 1
Lemma 2
Lemma 3
proof
proof
proof

GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks

TL;DR

Abstract

GNN-VPA: A Variance-Preserving Aggregation Strategy for Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (6)