Table of Contents
Fetching ...

Entropy Aware Message Passing in Graph Neural Networks

Philipp Nazari, Oliver Lemke, Davide Guidobene, Artiom Gesp

TL;DR

This work tackles oversmoothing in deep Graph Neural Networks by introducing an entropy-aware message passing mechanism. It defines node energies $E_i$ via Dirichlet energy and builds an unnormalized Boltzmann distribution $p_i = e^{-E_i/T}$, with Shannon entropy $S$ guiding a gradient ascent term added to layer updates; a closed-form gradient $\nabla_{\mathbf X_i} S$ ensures efficient computation with $\mathcal{O}(m+n)$ complexity. The method is architecture-agnostic and demonstrated on standard benchmarks, showing comparable mitigation of oversmoothing to existing baselines while highlighting that alleviating oversmoothing alone does not guarantee state-of-the-art deep-network accuracy. Hyperparameter sensitivity to task and a flexible implementation are emphasized, with code provided for replication. Overall, the approach offers a physics-inspired, scalable regularization that can be integrated with various GNN designs to preserve embedding entropy during learning.

Abstract

Deep Graph Neural Networks struggle with oversmoothing. This paper introduces a novel, physics-inspired GNN model designed to mitigate this issue. Our approach integrates with existing GNN architectures, introducing an entropy-aware message passing term. This term performs gradient ascent on the entropy during node aggregation, thereby preserving a certain degree of entropy in the embeddings. We conduct a comparative analysis of our model against state-of-the-art GNNs across various common datasets.

Entropy Aware Message Passing in Graph Neural Networks

TL;DR

This work tackles oversmoothing in deep Graph Neural Networks by introducing an entropy-aware message passing mechanism. It defines node energies via Dirichlet energy and builds an unnormalized Boltzmann distribution , with Shannon entropy guiding a gradient ascent term added to layer updates; a closed-form gradient ensures efficient computation with complexity. The method is architecture-agnostic and demonstrated on standard benchmarks, showing comparable mitigation of oversmoothing to existing baselines while highlighting that alleviating oversmoothing alone does not guarantee state-of-the-art deep-network accuracy. Hyperparameter sensitivity to task and a flexible implementation are emphasized, with code provided for replication. Overall, the approach offers a physics-inspired, scalable regularization that can be integrated with various GNN designs to preserve embedding entropy during learning.

Abstract

Deep Graph Neural Networks struggle with oversmoothing. This paper introduces a novel, physics-inspired GNN model designed to mitigate this issue. Our approach integrates with existing GNN architectures, introducing an entropy-aware message passing term. This term performs gradient ascent on the entropy during node aggregation, thereby preserving a certain degree of entropy in the embeddings. We conduct a comparative analysis of our model against state-of-the-art GNNs across various common datasets.
Paper Structure (11 sections, 7 theorems, 29 equations, 3 figures, 1 table)

This paper contains 11 sections, 7 theorems, 29 equations, 3 figures, 1 table.

Key Result

Theorem 3.1

The gradient of the Entropy $S$ with respect to $\mathbf X_i$ is

Figures (3)

  • Figure 1: Panel (\ref{['fig:entropy-gradient']}) explains the effect of gradient ascent during aggregation, where a node $\bullet$ is pushed in a direction that is a weighted superposition $\rightarrow$ of the vectors pointing from its neighbors $\bullet$, in a way that leads to a maximal increase in entropy. Panel (\ref{['fig:plnp']}) shows the contribution to the entropy of a single node $i$, as a function of energy $E_i$ at temperature $T$. The contribution is maximized iff $E_i = T$.
  • Figure 2: Energy as a function of depth evaluated on the nearest neighbor graph of a randomly initialized $10 \times 10$ grid. Entropic GCN ensures constant energy for depth up to $1000$, while basic GCN oversmooths quickly.
  • Figure 3: Energy at each layer for models trained on Cora. The U-shape of the basic GCN's curve could suggest that graph neural networks oversmooth in intermediate layers.

Theorems & Definitions (15)

  • Theorem 3.1
  • proof
  • Lemma 3.2
  • proof
  • Lemma 1.1
  • proof
  • Lemma 1.2
  • proof
  • Lemma 1.3
  • proof
  • ...and 5 more