Table of Contents
Fetching ...

GRANOLA: Adaptive Normalization for Graph Neural Networks

Moshe Eliasof, Beatrice Bevilacqua, Carola-Bibiane Schönlieb, Haggai Maron

TL;DR

This paper proposes GRANOLA, a novel graph-adaptive normalization layer that normalizes node features by adapting to the specific characteristics of the graph, particularly by generating expressive representations of its neighborhood structure by leveraging the propagation of Random Node Features in the graph.

Abstract

In recent years, significant efforts have been made to refine the design of Graph Neural Network (GNN) layers, aiming to overcome diverse challenges, such as limited expressive power and oversmoothing. Despite their widespread adoption, the incorporation of off-the-shelf normalization layers like BatchNorm or InstanceNorm within a GNN architecture may not effectively capture the unique characteristics of graph-structured data, potentially reducing the expressive power of the overall architecture. Moreover, existing graph-specific normalization layers often struggle to offer substantial and consistent benefits. In this paper, we propose GRANOLA, a novel graph-adaptive normalization layer. Unlike existing normalization layers, GRANOLA normalizes node features by adapting to the specific characteristics of the graph, particularly by generating expressive representations of its neighborhood structure, obtained by leveraging the propagation of Random Node Features (RNF) in the graph. We present theoretical results that support our design choices. Our extensive empirical evaluation of various graph benchmarks underscores the superior performance of GRANOLA over existing normalization techniques. Furthermore, GRANOLA emerges as the top-performing method among all baselines within the same time complexity of Message Passing Neural Networks (MPNNs).

GRANOLA: Adaptive Normalization for Graph Neural Networks

TL;DR

This paper proposes GRANOLA, a novel graph-adaptive normalization layer that normalizes node features by adapting to the specific characteristics of the graph, particularly by generating expressive representations of its neighborhood structure by leveraging the propagation of Random Node Features in the graph.

Abstract

In recent years, significant efforts have been made to refine the design of Graph Neural Network (GNN) layers, aiming to overcome diverse challenges, such as limited expressive power and oversmoothing. Despite their widespread adoption, the incorporation of off-the-shelf normalization layers like BatchNorm or InstanceNorm within a GNN architecture may not effectively capture the unique characteristics of graph-structured data, potentially reducing the expressive power of the overall architecture. Moreover, existing graph-specific normalization layers often struggle to offer substantial and consistent benefits. In this paper, we propose GRANOLA, a novel graph-adaptive normalization layer. Unlike existing normalization layers, GRANOLA normalizes node features by adapting to the specific characteristics of the graph, particularly by generating expressive representations of its neighborhood structure, obtained by leveraging the propagation of Random Node Features (RNF) in the graph. We present theoretical results that support our design choices. Our extensive empirical evaluation of various graph benchmarks underscores the superior performance of GRANOLA over existing normalization techniques. Furthermore, GRANOLA emerges as the top-performing method among all baselines within the same time complexity of Message Passing Neural Networks (MPNNs).
Paper Structure (44 sections, 5 theorems, 47 equations, 5 figures, 14 tables, 1 algorithm)

This paper contains 44 sections, 5 theorems, 47 equations, 5 figures, 14 tables, 1 algorithm.

Key Result

Proposition 4.0

Assume our input domain consists of graphs of a specific size. For every MPNN with Granola-no-rnf (eq:hat-h-features) there exits a standard MPNN with the same expressive power.

Figures (5)

  • Figure 1: Illustration of normalization layers. We denote by $B$, $N$ and $C$ the number of graphs (batch size), nodes, and channels (node features), respectively. For simplicity of presentation, we use the same number of nodes for all graphs. We color in blue the elements used to compute the statistics employed inside the normalization layer.
  • Figure 2: A batch of two graphs, where subtracting the mean of the node features computed across the batch, as in BatchNorm and related methods, results in the loss of capacity to compute node degrees.
  • Figure 3: Illustration of a Granola layer. Given node features ${{\mathbf{H}}}^{(\ell-1)}_b$ and the adjacency matrix $\mathbf{A}_b$, we feed them to a $\text{GNN}^{(\ell-1)}_{\textsc{layer}}$ to extract intermediate node features $\tilde{{\mathbf{H}}}_b^{(\ell)}$. Then, we predict normalization parameters using $\text{GNN}^{(\ell)}_{\textsc{norm}}$, which takes sampled RNF ${\mathbf{R}}^{(\ell)}_b$, $\tilde{{\mathbf{H}}}^{(\ell)}_b$, $\mathbf{A}_b$. Including ${\mathbf{R}}_b^{(\ell)}$ with $\mathbf{A}_b$ and $\tilde{{\mathbf{H}}}^{(\ell)}_b$ enhances the expressiveness of Granola ensuring full adaptivity.
  • Figure 4: Training convergence of Granola compared with existing normalization techniques show that Granola achieves faster convergence and overall lower (better) MAE.
  • Figure 5: A graph, where subtracting the mean of the node features computed on the feature dimension, as in LayerNorm-node and related methods, results in the loss of capacity to compute node degrees.

Theorems & Definitions (15)

  • Proposition 4.0: RNF are necessary in for increased expressive power
  • Example C.1: BatchNorm reduces GNN capabilities to compute node degrees (\ref{['fig:BNfailExample']})
  • Remark C.2: Deeper networks are also limited
  • Remark C.3: BN with Affine transformation is also limited
  • Example C.4: InstanceNorm reduces GNN capabilities to compute node degree (\ref{['fig:BNfailExample']} considering all graphs as disconnected components in a single graph)
  • Example C.5: LayerNorm-node reduces GNN capabilities to compute node degree (\ref{['fig:LNfailExample']})
  • Remark C.6: Deeper networks are also limited
  • Theorem E.1: Existing normalization techniques limit MPNNs’ expressivity
  • proof
  • Proposition E.1: MPNN with can implement MPNN with RNF
  • ...and 5 more