Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching

Federico Errica; Henrik Christiansen; Viktor Zaverkin; Takashi Maruyama; Mathias Niepert; Francesco Alesiani

Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching

Federico Errica, Henrik Christiansen, Viktor Zaverkin, Takashi Maruyama, Mathias Niepert, Francesco Alesiani

TL;DR

Adaptive Message Passing (AMP) introduces a variational framework that enables graph networks to learn both the depth of message passing and which messages to filter, addressing long-range interaction challenges. By modeling depth with a learned distribution over layers $L$ and applying differentiable, soft message filtering via $oldsymbol{F}$, AMP can form effectively deep yet selective networks, truncated at $oldsymbol{\hat{L}}$ for tractable inference. The approach yields theoretical insights on mitigating oversmoothing, oversquashing, and underreaching, and empirically improves performance on five long-range datasets spanning synthetic and chemical domains. This framework avoids costly graph rewiring or exhaustive depth searches, offering a principled, scalable route to robust long-range graph reasoning with per-layer readouts and depth-aware predictions.

Abstract

Long-range interactions are essential for the correct description of complex systems in many scientific fields. The price to pay for including them in the calculations, however, is a dramatic increase in the overall computational costs. Recently, deep graph networks have been employed as efficient, data-driven models for predicting properties of complex systems represented as graphs. These models rely on a message passing strategy that should, in principle, capture long-range information without explicitly modeling the corresponding interactions. In practice, most deep graph networks cannot really model long-range dependencies due to the intrinsic limitations of (synchronous) message passing, namely oversmoothing, oversquashing, and underreaching. This work proposes a general framework that learns to mitigate these limitations: within a variational inference framework, we endow message passing architectures with the ability to adapt their depth and filter messages along the way. With theoretical and empirical arguments, we show that this strategy better captures long-range interactions, by competing with the state of the art on five node and graph prediction datasets.

Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching

TL;DR

and applying differentiable, soft message filtering via

, AMP can form effectively deep yet selective networks, truncated at

for tractable inference. The approach yields theoretical insights on mitigating oversmoothing, oversquashing, and underreaching, and empirically improves performance on five long-range datasets spanning synthetic and chemical domains. This framework avoids costly graph rewiring or exhaustive depth searches, offering a principled, scalable route to robust long-range graph reasoning with per-layer readouts and depth-aware predictions.

Abstract

Paper Structure (48 sections, 4 theorems, 50 equations, 9 figures, 6 tables)

This paper contains 48 sections, 4 theorems, 50 equations, 9 figures, 6 tables.

Introduction
Related Work
Oversquashing.
Oversmoothing.
Adaptive Architectures.
Adaptive Message Passing
Definitions.
Multi-output Family of Architectures.
Variational Inference
AMP Formulation.
Choice of the Variational Distributions.
Computation of the ELBO.
Practical Considerations.
Computational Considerations
Experimental Details
...and 33 more sections

Key Result

Theorem 3.1

For AMP with $m$ layers and $u, v \in \mathcal{V}$, Here, MPNN is in the following form where $\mathrm{up}, \mathrm{rs},$ and $\mathrm{mp}$ are Lipschitz functions as in di_over_2023 with constants $c_{\mathrm{up}}, c_{\mathrm{rs}}, c_{\mathrm{mp}}$, $c_F$ is the upper bound of the entry-wise $L^ 1$ matrix norm of $\frac{\partial F}{\partial x}$ for the filtering function $F$, $k_h$ is

Figures (9)

Figure 1: The graphical model of AMP, where white and blue circles denote, respectively, latent and observable random variables. $\Theta_{\ell}$ is the r.v. over the parameters of layer $k$, $\mathcal{F}_i$ defines a distribution over the message filters, $\mathcal{L}$ implements a distribution over the layers of the architecture, while $\mathcal{G}_i$ and $\mathcal{T}_i$ are distributions over the (observable) input graph and the target label, respectively.
Figure 2: Given an input graph (a) and a discrete message filtering scheme (b), we observe how a $L$=2-layer standard message passing (c) differs from AMP (d) in terms of the number of messages sent. Please refer to the text for more details.
Figure 3: We show the Dirichlet energy (left) and the sensitivity (right) across layers for the GCN model and its AMP version.
Figure 4: We show the distribution learned by the best configurations of each base model on the synthetic and chemical datasets.
Figure 5: (Top right) we visualize the amount of information preserved in each layer by AMP$_{\textsc{GCN}}$. (Others) Ablation study of message filtering scheme: If a point lies in the area represented by the gray color, then filtering is not beneficial.
...and 4 more figures

Theorems & Definitions (11)

Theorem 3.1
proof
Theorem 3.2: Short Version
proof
Definition 2.1
Theorem 2.2
proof
Theorem 2.3
proof
proof
...and 1 more

Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching

TL;DR

Abstract

Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (11)