GRANOLA: Adaptive Normalization for Graph Neural Networks

Moshe Eliasof; Beatrice Bevilacqua; Carola-Bibiane Schönlieb; Haggai Maron

GRANOLA: Adaptive Normalization for Graph Neural Networks

Moshe Eliasof, Beatrice Bevilacqua, Carola-Bibiane Schönlieb, Haggai Maron

TL;DR

This paper proposes GRANOLA, a novel graph-adaptive normalization layer that normalizes node features by adapting to the specific characteristics of the graph, particularly by generating expressive representations of its neighborhood structure by leveraging the propagation of Random Node Features in the graph.

Abstract

In recent years, significant efforts have been made to refine the design of Graph Neural Network (GNN) layers, aiming to overcome diverse challenges, such as limited expressive power and oversmoothing. Despite their widespread adoption, the incorporation of off-the-shelf normalization layers like BatchNorm or InstanceNorm within a GNN architecture may not effectively capture the unique characteristics of graph-structured data, potentially reducing the expressive power of the overall architecture. Moreover, existing graph-specific normalization layers often struggle to offer substantial and consistent benefits. In this paper, we propose GRANOLA, a novel graph-adaptive normalization layer. Unlike existing normalization layers, GRANOLA normalizes node features by adapting to the specific characteristics of the graph, particularly by generating expressive representations of its neighborhood structure, obtained by leveraging the propagation of Random Node Features (RNF) in the graph. We present theoretical results that support our design choices. Our extensive empirical evaluation of various graph benchmarks underscores the superior performance of GRANOLA over existing normalization techniques. Furthermore, GRANOLA emerges as the top-performing method among all baselines within the same time complexity of Message Passing Neural Networks (MPNNs).

GRANOLA: Adaptive Normalization for Graph Neural Networks

TL;DR

Abstract

Paper Structure (44 sections, 5 theorems, 47 equations, 5 figures, 14 tables, 1 algorithm)

This paper contains 44 sections, 5 theorems, 47 equations, 5 figures, 14 tables, 1 algorithm.

Introduction
Normalization layers for GNNs
Basic setup and definitions
Current normalization layers for GNNs
Method
Motivation.
Design considerations and overview
Granola
Granola layer.
Granola-no-rnf.
Complexity.
Theoretical Analysis
Experimental Results
Baselines.
Conclusions
...and 29 more sections

Key Result

Proposition 4.0

Assume our input domain consists of graphs of a specific size. For every MPNN with Granola-no-rnf (eq:hat-h-features) there exits a standard MPNN with the same expressive power.

Figures (5)

Figure 1: Illustration of normalization layers. We denote by $B$, $N$ and $C$ the number of graphs (batch size), nodes, and channels (node features), respectively. For simplicity of presentation, we use the same number of nodes for all graphs. We color in blue the elements used to compute the statistics employed inside the normalization layer.
Figure 2: A batch of two graphs, where subtracting the mean of the node features computed across the batch, as in BatchNorm and related methods, results in the loss of capacity to compute node degrees.
Figure 3: Illustration of a Granola layer. Given node features ${{\mathbf{H}}}^{(\ell-1)}_b$ and the adjacency matrix $\mathbf{A}_b$, we feed them to a $\text{GNN}^{(\ell-1)}_{\textsc{layer}}$ to extract intermediate node features $\tilde{{\mathbf{H}}}_b^{(\ell)}$. Then, we predict normalization parameters using $\text{GNN}^{(\ell)}_{\textsc{norm}}$, which takes sampled RNF ${\mathbf{R}}^{(\ell)}_b$, $\tilde{{\mathbf{H}}}^{(\ell)}_b$, $\mathbf{A}_b$. Including ${\mathbf{R}}_b^{(\ell)}$ with $\mathbf{A}_b$ and $\tilde{{\mathbf{H}}}^{(\ell)}_b$ enhances the expressiveness of Granola ensuring full adaptivity.
Figure 4: Training convergence of Granola compared with existing normalization techniques show that Granola achieves faster convergence and overall lower (better) MAE.
Figure 5: A graph, where subtracting the mean of the node features computed on the feature dimension, as in LayerNorm-node and related methods, results in the loss of capacity to compute node degrees.

Theorems & Definitions (15)

Proposition 4.0: RNF are necessary in for increased expressive power
Example C.1: BatchNorm reduces GNN capabilities to compute node degrees (\ref{['fig:BNfailExample']})
Remark C.2: Deeper networks are also limited
Remark C.3: BN with Affine transformation is also limited
Example C.4: InstanceNorm reduces GNN capabilities to compute node degree (\ref{['fig:BNfailExample']} considering all graphs as disconnected components in a single graph)
Example C.5: LayerNorm-node reduces GNN capabilities to compute node degree (\ref{['fig:LNfailExample']})
Remark C.6: Deeper networks are also limited
Theorem E.1: Existing normalization techniques limit MPNNs’ expressivity
proof
Proposition E.1: MPNN with can implement MPNN with RNF
...and 5 more

GRANOLA: Adaptive Normalization for Graph Neural Networks

TL;DR

Abstract

GRANOLA: Adaptive Normalization for Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (15)