Entropy Aware Message Passing in Graph Neural Networks

Philipp Nazari; Oliver Lemke; Davide Guidobene; Artiom Gesp

Entropy Aware Message Passing in Graph Neural Networks

Philipp Nazari, Oliver Lemke, Davide Guidobene, Artiom Gesp

TL;DR

This work tackles oversmoothing in deep Graph Neural Networks by introducing an entropy-aware message passing mechanism. It defines node energies $E_i$ via Dirichlet energy and builds an unnormalized Boltzmann distribution $p_i = e^{-E_i/T}$, with Shannon entropy $S$ guiding a gradient ascent term added to layer updates; a closed-form gradient $\nabla_{\mathbf X_i} S$ ensures efficient computation with $\mathcal{O}(m+n)$ complexity. The method is architecture-agnostic and demonstrated on standard benchmarks, showing comparable mitigation of oversmoothing to existing baselines while highlighting that alleviating oversmoothing alone does not guarantee state-of-the-art deep-network accuracy. Hyperparameter sensitivity to task and a flexible implementation are emphasized, with code provided for replication. Overall, the approach offers a physics-inspired, scalable regularization that can be integrated with various GNN designs to preserve embedding entropy during learning.

Abstract

Deep Graph Neural Networks struggle with oversmoothing. This paper introduces a novel, physics-inspired GNN model designed to mitigate this issue. Our approach integrates with existing GNN architectures, introducing an entropy-aware message passing term. This term performs gradient ascent on the entropy during node aggregation, thereby preserving a certain degree of entropy in the embeddings. We conduct a comparative analysis of our model against state-of-the-art GNNs across various common datasets.

Entropy Aware Message Passing in Graph Neural Networks

TL;DR

This work tackles oversmoothing in deep Graph Neural Networks by introducing an entropy-aware message passing mechanism. It defines node energies

via Dirichlet energy and builds an unnormalized Boltzmann distribution

, with Shannon entropy

guiding a gradient ascent term added to layer updates; a closed-form gradient

ensures efficient computation with

complexity. The method is architecture-agnostic and demonstrated on standard benchmarks, showing comparable mitigation of oversmoothing to existing baselines while highlighting that alleviating oversmoothing alone does not guarantee state-of-the-art deep-network accuracy. Hyperparameter sensitivity to task and a flexible implementation are emphasized, with code provided for replication. Overall, the approach offers a physics-inspired, scalable regularization that can be integrated with various GNN designs to preserve embedding entropy during learning.

Abstract

Paper Structure (11 sections, 7 theorems, 29 equations, 3 figures, 1 table)

This paper contains 11 sections, 7 theorems, 29 equations, 3 figures, 1 table.

Introduction
Constructing An Entropy
Entropy Aware Message Passing
Related Work
Results
Hyperparameter Selection in Entropic GCN.
Discussion
Proofs
Proof of Theorem \ref{['theorem:ds']}
Proof of Lemma \ref{['lemma:complexity']}
Application To Neural Graph Diffusion

Key Result

Theorem 3.1

The gradient of the Entropy $S$ with respect to $\mathbf X_i$ is

Figures (3)

Figure 1: Panel (\ref{['fig:entropy-gradient']}) explains the effect of gradient ascent during aggregation, where a node $\bullet$ is pushed in a direction that is a weighted superposition $\rightarrow$ of the vectors pointing from its neighbors $\bullet$, in a way that leads to a maximal increase in entropy. Panel (\ref{['fig:plnp']}) shows the contribution to the entropy of a single node $i$, as a function of energy $E_i$ at temperature $T$. The contribution is maximized iff $E_i = T$.
Figure 2: Energy as a function of depth evaluated on the nearest neighbor graph of a randomly initialized $10 \times 10$ grid. Entropic GCN ensures constant energy for depth up to $1000$, while basic GCN oversmooths quickly.
Figure 3: Energy at each layer for models trained on Cora. The U-shape of the basic GCN's curve could suggest that graph neural networks oversmooth in intermediate layers.

Theorems & Definitions (15)

Theorem 3.1
proof
Lemma 3.2
proof
Lemma 1.1
proof
Lemma 1.2
proof
Lemma 1.3
proof
...and 5 more

Entropy Aware Message Passing in Graph Neural Networks

TL;DR

Abstract

Entropy Aware Message Passing in Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (15)